I am just keep facing the 404 error problem on the webhook delivery (note that the payload and request are successfully sent, there is 404 error, not timeout), it’s about 30%~40$ webhook payload delivery will got the error, and a redeliver would usually make it 200 ok again, I’m very not sure what’s going on, the debug mode is turned on at the drone server, but didn’t see any related info to the failed deliveries.
The request header is: (token and X-GitHub-Delivery are replaced with XXXXX)
The code that processes the webhooks prints something to the logs for every possible failure path. So if you are seeing 404s and they are coming from drone, you will see the reason in the drone server logs.
without the 404 and error message it won’t be possible to diagnose this further. If you don’t see a 404 in drone logs the assumption (on my end) is that drone is not receiving the http request, and it is a problem up the stack.
the logs shouldn’t be too difficult to search, unless you have DRONE_BROKER_DEBUG=true in which case you can disable. This is extremely verbose and not recommended if you are not explicitly trying to debug errors in the message broker.
there are only two possible reasons for a 404, which you can see in the hooks.go file.
The first 404 occurs when the repository name in the webhook signature (encoded as a url parameter) is not found in the database. This generally happens when:
someone tries to manually modify the hook url in github or
the repository name changes
log.Errorf("failure to find repo %s/%s from hook. %s", tmprepo.Owner, tmprepo.Name, err)
The second 404 occurs when drone cannot fetch the .drone.yml for the commit sha. This generally happens when either forgets to include a .drone.yml in the branch, or incorrectly assumes that drone only pull the .drone.yml from master, which is not the case.
log.Errorf("failure to get build config for %s. %s", repo.FullName, err)
These two errors are the only code paths in drone that can return a 404 for a hook. If you do not see either of these errors in the logs it means something else is failing up your stack, which means the error is happening before the hook even gets to drone.
Looks like it can’t get the build config from GitHub, but I can’t reproduce the 404 on grabbing the files from GitHub, and I don’t have the same problem on Travis CI and Circle CI, not sure how to deal with this problem, can we have some retries here? Thanks.
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=e77e2e65537bbbe25b3d57b83d61b54985afa8ef: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=e77e2e65537bbbe25b3d57b83d61b54985afa8ef: 404 Not Found 
ip=18.104.22.168 latency=411.596188ms method=POST path=/hook status=404 time=2016-11-26T05:52:52Z user-agent=GitHub-Hookshot/d9ba1f0
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=fb6b1e65b045dcf900f9443cd908541b156512ab: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=fb6b1e65b045dcf900f9443cd908541b156512ab: 404 Not Found 
ip=22.214.171.124 latency=408.993726ms method=POST path=/hook status=404 time=2016-11-26T06:24:06Z user-agent=GitHub-Hookshot/d9ba1f0
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=dd3bc82a4e497e72d97764f8982900e438c34976: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=dd3bc82a4e497e72d97764f8982900e438c34976: 404 Not Found 
ip=126.96.36.199 latency=410.052014ms method=POST path=/hook status=404 time=2016-11-26T07:00:04Z user-agent=GitHub-Hookshot/d9ba1f0
So the question I have is why is GitHub returning a 404? Perhaps an issue should be opened with GitHub that valid request to fetch a file using the API returns a 404 even though the file exists?
but I can’t reproduce the 404 on grabbing the files from GitHub
my guess is if you re-send the hook from GitHub (via the GitHub webhooks UI) drone will run just fine and will not get a 404. This would demonstrate that it has nothing to do with drone or the drone code, but is a GitHub issue where they are not properly able to serve their content.
I don’t have the same problem on Travis CI and Circle CI
maybe because travis and circle have long queue times, and don’t process the requests for many seconds after the hook is received? Maybe GitHub has an eventually-consistent database, and sometimes the API request reads from a node that does not yet have the record? Perhaps GitHub could shed some light here.
OK great, thanks! Let’s see what information we can get from GitHub. I doubt we will see an immediate fix from them (although maybe) but if they can help us understand the root cause we can at least craft an effective fix in drone.
That is actually my guess. Maybe GitHub has an eventually consistent database and given the large size of CDNJS it takes a few seconds for the commit to propagate to all their storage servers. It the API gets routed to a storage node without the content it gets a 404.
This could also explain why Travis and Circle don’t have the same issue, because your Drone server doesn’t have large backlogs in the queue and thus processes the commit-hook immediately. Compare to Travis which probably has 50 items in the queue ahead of your build when the commit-hook is received, resulting in delayed processing.
I found this 404 webhook issue yesterday. I tried regenerating the OATH Secret but it didn’t fix it. When I removed and re-added the repo in the drone UI, the webhooks worked again. I realized that another owner in our github org enabled these repos in the drone ui and he left the org on Friday. Since then, all of the repos he added in drone were getting 404 errors on the webhooks. Even though the repositories are owned by the organization, there is some credential cached that is tied to the user that enables the repository?