Pull request with missing PR builds

I ran into a weird issue where someone had a PR open. They were making commits to branch PR referenced. First couple commits had the “pr” and “push” builds occur. Then a couple had only “push”. Then the rest had “pr” and “push”.

While trying to determine why a couple of commits did not have the “pr” build occur we found 2 web hook events sent got a 500 response from Drone.

When checking the logs from Drone server I noticed the following errors that occurred at the same time for that repo:

time="2017-02-10T15:20:54Z" level=error msg="failure to save commit for user/repo. meddler.Insert: DB error in QueryRow: pq: duplicate key value violates unique constraint \"builds_build_number_build_repo_id_key\""
time="2017-02-10T15:20:54Z" level=error msg="Error #01: meddler.Insert: DB error in QueryRow: pq: duplicate key value violates unique constraint \"builds_build_number_build_repo_id_key\"\n" ip=x.x.x.x latency=261.551846ms method=POST path="/hook" status=500 time="2017-02-10T15:20:54Z" user-agent="GitHub-Hookshot/9254d22"

time="2017-02-10T15:15:13Z" level=error msg="Error #01: meddler.Insert: DB error in QueryRow: pq: duplicate key value violates unique constraint \"builds_build_number_build_repo_id_key\"\n" ip=x.x.x.x latency=267.218395ms method=POST path="/hook" status=500 time="2017-02-10T15:15:13Z" user-agent="GitHub-Hookshot/9254d22"
time="2017-02-10T15:15:13Z" level=error msg="failure to save commit for user/repo. meddler.Insert: DB error in QueryRow: pq: duplicate key value violates unique constraint \"builds_build_number_build_repo_id_key\""

I’m trying to figure out what caused the issue above but not sure how the errors relate. Thanks!

We have an open issue for this one https://github.com/drone/drone/issues/1919

When drone receives a hook it creates a new build with the build number which is set to the previous build number +1. If a single repository receives multiple hooks simultaneously it is possible the system will try to use two builds with the same build number.

Unfortunately there isn’t a great solution that is cross-database for atomically incrementing the build number so we need to adjust the implementation. A short term workaround would be to do a simple retry on failure.

A slightly better solution would be to store the last_build_number in the repository table and use our sql foo to prevent two go routines from incrementing the same value.

> select * from repos where repo_full_name = 'octocat/hello-world';
53

> update repos set last_build_number = 54 where last_build_number = 53 and repo_id = ...;
1 row(s) affected

If RowsAffected() == 0 we know that another build has already incremented and reserved the number. If RowsAffected() == 1 we know that we can safely use the number for the build. https://golang.org/pkg/database/sql/#Result

I think this would be a better solution in the end, since relying on error messages can be difficult since they are different across database versions and database vendors.