I’ve been closely following the drone-git issue about /pr jobs building the wrong commit and I understand the issue was just closed and locked because the understanding was that there was an unresolved bug upstream in Github itself.
Since my company has a paid Github support plan, I filed a support ticket with Github because we’ve been bitten by this issue. Here’s what I said:
There appears to be a bug in Github where upon pull request creation, or pushing to an existing pull request, the PR webhook is fired before the merge ref is created.
This results in dangerous bugs in CI systems where the code being tested in a CI job is not the code that the PR author thinks is being tested.
Specifically when this happens, one of two things happens:
1) The CI job fails because the merge ref cannot be found (if this is right when the PR is opened)
2) The CI job runs but uses the wrong commit (if this is a subsequent push to an existing PR) - This is because the FETCH_HEAD is pointing to an older commit.
Here is Github’s reply:
The /merge refs that are being used here are an undocumented feature and you shouldn’t be relying on them. That feature was build to power Pull Requests in the Web UI, it was not built to be used by 3rd party services. Because it’s undocumented – you should not have any expectations about behavior, the behavior might change at any time and those refs might completely go away without warning. My recommendation is that if you need a merge commit between the base and head refs, you create that merge commit yourself in the local clone instead of relying on merge commits from GitHub. If there’s a reason why you need to rely on this undocumented feature and can’t create merges on your end, please let us know – we’d be interested to hear your use-case.
That said, if I understood your description correctly – the behavior you observed related to the /merge ref is expected for now and I’m not aware of any plans to change that behavior. When the base or head ref of a pull request is updated, the mergeability of the pull request is not triggered right away. Instead, the mergeability is triggered when it’s needed: when a user visits the page for the pull request via the UI or requests the pull request via the API. In order to determine the mergeability of a pull request, GitHub actually performs a merge between the head and the base ref, and the result of that merge is stored in the /merge ref (the ref you’re using). Also, this merging happens in a background job since it needs to be performed on disk and might take a while sometimes. That’s why, for example, if you fetch the pull request via the API, you first see a null value indicating that the computation of the mergeability has been triggered, and then when you fetch it again – you see the actual result of the mergeability (see the blue note here: https://developer.github.com/v3/pulls/#get-a-single-pull-request). The reason why mergeability is computed only when it’s requested is performance – imagine, for example, a repository with lots of pull requests targeted at master, and then master is updated. At that point, you’d need to re-compute the mergeability of all pull requests targeted at master. Doing that all at once would cause unnecessary performance spikes, and might cause performance problems.
So, if I understood correctly – the reason why you’re not seeing the /merge ref (or why it’s not up-to-date) is a result of how this ref is used for determining the mergeability of a pull request on GitHub (when the computation of the mergeability is triggered and how it’s done). Webhooks are triggered as soon as the head ref of the pull request is updated, and that’s expected – webhooks will not wait for a mergeability check to complete because that check is performed when a user requests that data. So, if no user requests that data, webhooks would be blocked.
My suggestion would be that you switch to using your own merges that you’d create in your local clone of the repository, and if there’s a reason you really don’t want to do that – you could use the API to fetch the pull request (in order to trigger the mergeability check), and fetch the /merge ref for the pull request only after you’ve received a true value back for the mergeable attribute.
Hope this helps, and let me know if you have any other questions about this.
Given this information, should we re-evaluate how cloning is done for /pr jobs?
Edit: CC some folks who have been involved in the github issue discussion: @dragos @tonglil
Edit 2: Asked some follow-up questions. I asked:
My understanding from the drone maintainers was that the merge ref is the same feature that’s used by other popular CI platforms like TravisCI and CircleCI. Is this true, and does this mean that these CI systems are also using an undocumented feature in an unsafe way? Or are they doing things differently?
Github replied:
I wish I could answer that, but we didn’t build any of those other services, so we can’t say what they use and why, or if they do something differently. I recommend reaching out to them if you have questions about their services and comparing them to what you use right now.
I asked:
The end-goal is to run CI jobs on the code that would result from the simulated merge of the PR branch into the target branch.
Github replied:
The recommended way of doing that is to create that merge locally (by merging the head and base in a temporary branch) or wait for that merge commit to be created on our end before trying to fetch that commit. The API documentation has some notes on that: https://developer.github.com/v3/pulls/#get-a-single-pull-request
Would either of the above recommendations be viable for the drone-git plugin?