Github Claims that Merge Refs are "Undocumented Feature"

I’ve been closely following the drone-git issue about /pr jobs building the wrong commit and I understand the issue was just closed and locked because the understanding was that there was an unresolved bug upstream in Github itself.

Since my company has a paid Github support plan, I filed a support ticket with Github because we’ve been bitten by this issue. Here’s what I said:

There appears to be a bug in Github where upon pull request creation, or pushing to an existing pull request, the PR webhook is fired before the merge ref is created.

This results in dangerous bugs in CI systems where the code being tested in a CI job is not the code that the PR author thinks is being tested.

Specifically when this happens, one of two things happens:
1) The CI job fails because the merge ref cannot be found (if this is right when the PR is opened)
2) The CI job runs but uses the wrong commit (if this is a subsequent push to an existing PR) - This is because the FETCH_HEAD is pointing to an older commit.

Here is Github’s reply:

The /merge refs that are being used here are an undocumented feature and you shouldn’t be relying on them. That feature was build to power Pull Requests in the Web UI, it was not built to be used by 3rd party services. Because it’s undocumented – you should not have any expectations about behavior, the behavior might change at any time and those refs might completely go away without warning. My recommendation is that if you need a merge commit between the base and head refs, you create that merge commit yourself in the local clone instead of relying on merge commits from GitHub. If there’s a reason why you need to rely on this undocumented feature and can’t create merges on your end, please let us know – we’d be interested to hear your use-case.

That said, if I understood your description correctly – the behavior you observed related to the /merge ref is expected for now and I’m not aware of any plans to change that behavior. When the base or head ref of a pull request is updated, the mergeability of the pull request is not triggered right away. Instead, the mergeability is triggered when it’s needed: when a user visits the page for the pull request via the UI or requests the pull request via the API. In order to determine the mergeability of a pull request, GitHub actually performs a merge between the head and the base ref, and the result of that merge is stored in the /merge ref (the ref you’re using). Also, this merging happens in a background job since it needs to be performed on disk and might take a while sometimes. That’s why, for example, if you fetch the pull request via the API, you first see a null value indicating that the computation of the mergeability has been triggered, and then when you fetch it again – you see the actual result of the mergeability (see the blue note here: https://developer.github.com/v3/pulls/#get-a-single-pull-request). The reason why mergeability is computed only when it’s requested is performance – imagine, for example, a repository with lots of pull requests targeted at master, and then master is updated. At that point, you’d need to re-compute the mergeability of all pull requests targeted at master. Doing that all at once would cause unnecessary performance spikes, and might cause performance problems.

So, if I understood correctly – the reason why you’re not seeing the /merge ref (or why it’s not up-to-date) is a result of how this ref is used for determining the mergeability of a pull request on GitHub (when the computation of the mergeability is triggered and how it’s done). Webhooks are triggered as soon as the head ref of the pull request is updated, and that’s expected – webhooks will not wait for a mergeability check to complete because that check is performed when a user requests that data. So, if no user requests that data, webhooks would be blocked.

My suggestion would be that you switch to using your own merges that you’d create in your local clone of the repository, and if there’s a reason you really don’t want to do that – you could use the API to fetch the pull request (in order to trigger the mergeability check), and fetch the /merge ref for the pull request only after you’ve received a true value back for the mergeable attribute.

Hope this helps, and let me know if you have any other questions about this.

Given this information, should we re-evaluate how cloning is done for /pr jobs?

Edit: CC some folks who have been involved in the github issue discussion: @dragos @tonglil

Edit 2: Asked some follow-up questions. I asked:

My understanding from the drone maintainers was that the merge ref is the same feature that’s used by other popular CI platforms like TravisCI and CircleCI. Is this true, and does this mean that these CI systems are also using an undocumented feature in an unsafe way? Or are they doing things differently?

Github replied:

I wish I could answer that, but we didn’t build any of those other services, so we can’t say what they use and why, or if they do something differently. I recommend reaching out to them if you have questions about their services and comparing them to what you use right now.

I asked:

The end-goal is to run CI jobs on the code that would result from the simulated merge of the PR branch into the target branch.

Github replied:

The recommended way of doing that is to create that merge locally (by merging the head and base in a temporary branch) or wait for that merge commit to be created on our end before trying to fetch that commit. The API documentation has some notes on that: https://developer.github.com/v3/pulls/#get-a-single-pull-request

Would either of the above recommendations be viable for the drone-git plugin?

@geekdave thanks for inquiring with github and posting the details.

Would either of the above recommendations be viable for the drone-git plugin?

Drone does not currently track enough information to clone from the base ref. I would recommend a github-clone plugin that uses the GitHub API to retrieve this additional information at runtime to clone the base ref.

Starting with 0.9 there will be two important changes to how Drone clones a repository by default. The first is that 0.9 will no longer use the clone plugin by default. The clone plugin will be available, but as an alternative to the built-in clone logic.

The second change is that Drone will use the head ref to clone pull requests by default. The head ref is documented as the recommended approach to cloning a repository, and works very well when used with protected branches. It is a change I have been wanting to make for months, and now seems like the right time.

Teams that want to use a merge ref will have the option to use a community plugin or create their own custom plugin. I am primarily focused on Drone core, which means the development of such plugins would be community-led.

1 Like

The second change is that Drone will use the head ref to clone pull requests by default.

If we choose to adopt this default behavior I want to make sure I can explain it to my dev team who might not have the deep git knowledge to understand the technical details on their own.

Previous behavior (using git-clone plugin)
On each PR, drone creates two jobs:

  1. /pr: Tests the hypothetical result of merging the PR branch with the target branch
  2. /pull: Tests the PR branch in isolation

New behavior (in 0.9, using default core clone behavior)
On each PR, drone creates one job:

  • /push: Tests the PR branch in isolation
  • To make sure no regressions are introduced by merging to a moving target branch (someone else changing master in a way that would cause the PR to break once it lands), we should turn on this feature in Github protected branches:

Require branches to be up to date before merging: This ensures the branch has been tested with the latest code on master.

  • With the above, we’re logically protected in the same way as the /pr job protected us.

Sound good?

1 Like

Yes, exactly. And if you prefer the previous behavior or decide you want a slightly different behavior, you can always use plugins.

BTW you are getting pretty good at interpreting my comments :slight_smile:

This merge ref is all kinds of weird. Race conditions, failure to merge both cause unexpected build failures, as do some of the things that happen in a transparent merge. Plus I’m not sure everyone realizes that the merge is happening.

Not a great default behavior to expose!

FYI I think I have a solution for the merge ref problem. Planned change to git clone logic

Hey @bradrydzewski - I’m considering rolling my own plugin which would poll the github API and wait for the merge ref to be ready.

The endpoint: GET /repos/:owner/:repo/pulls/:number returns a mergeable attribute, which has the value null when the merge ref hasn’t been created yet. So if I waited for that to be non-null, I think this problem would be solved.

Any thoughts to that approach? It’s what the github support person recommended in the above email.

But if your proposed solution is close to completion, I may wait for that. Is it on the horizon or still a ways off?

If you are going to rollout your own plugin, I recommend using the approach defined here.

Why not to merge “master” after clone? Would be much easier than pulling GitHub API

git clone ...
git merge master # or other "configured" branch

Actually you may not need to do custom clone plugin, just add a new step for your pipeline.

Meanwhile, is there way to switch drone 0.8 to use HEAD instead of /merged? Current behaviour with /merge seems to be too unpredictable.

UPDATE:
Found it, you should set environment variable DRONE_GITHUB_MERGE_REF=false for drone server container. New behaviour will be applied for new builds only, old ones will keep using merged ref.

Is there any way of adjusting the DRONE_GITHUB_MERGE_REF behavior on a per-repository basis? My company’s Drone installation has it set to false globally, but I would really prefer the merge=true behavior for my own team’s projects.

(edit: oops, I see that this exact thing is addressed in Planned change to git clone logic)

For everyone stumbling upon this thread, please see the following thread and my latest comment: