Conversion extension errors are silently skipped over

tl;dr when our conversion extension reports an error the Drone build is silently not created

In our CI (in this case triggered on merges to our main branch in GitHub) we have a conversion extension. One of the things it does is fetch secrets.

Sometimes that can fail for various reasons. In that case, we report an error via the extension in the normal way by returning a Go error.

In the logs from the Drone server we see a message like:

  "commit": "xxx",
  "error": "got an error trying to find secret yyy: secret not found\n\n",
  "event": "push",
  "level": "warning",
  "msg": "trigger: cannot convert yaml",
  "ref": "refs/heads/master",
  "repo": "grafana/xxx",
  "time": "2022-09-01T16:46:20Z"

the error text comes from our code, and the msg comes from the drone server. So the error makes it back to the server.

It’s also reported in the UI (don’t appear to be able to directly upload images here).

The problem is that this is reported as a status 200 back to GitHub and the run is skipped silently. On our side we see it as the run simply not being triggered (the status check is not created), but the commit has a green check mark from our other actions.

Since the runs performs some critical business functions for us (synchronising Kubernetes manifests to be deployed by Flux), we really can’t have silent failures to execute. We need to have a way to have these conversion plugin failures surfaced so we can take action. I did have two ideas

  • Report a non-200 status. (But I’m not sure what GitHub would do with that.)
  • Report a created build but have it fail with the same error message. In our case we have alerting configured for this already so on-call engineers would be notified of the failure.

feel free to take them or leave them.

Might be related: Drone vault plugin (reporting problems, errrors)

@laney nice to see you pinging us after a longtime :smiley:

I have shared this issue internally and the team is working on it. I will keep you updated on this!


1 Like

I am reading through this thread, and it sounds like Drone is creating an entry for the Build in the Drone database, with an error status, which is visible in the user interface (per the screenshot you provided). A silent failure would generally imply that no build entry is created, and you have no way to know there was an error (other than looking at the logs). Just to ensure I’m not misunderstanding, can you confirm you see a build in the Drone user interface with the relevant error?

If yes, is it fair to define the problem statement as the following: When the extension returns an error, Drone does not create a GitHub status?

That’s right. If you were browsing the web UI at the right time, you would be able to see an :x: failure for the conversion extension error so it is actually possible to detect if you are there when it happens or if you page back and find all the failures for pushes to the main branch. (I didn’t try the CLI.)

The problem is really two things

  • No status check is created on GH
  • It’s not reported as a failure as far as our webhook listener (I didn’t mention this specific in the OP, apologies)

Ok, sorry for using the word silent then. Here’s where I was coming from: normal failures are noisy for us, deliberately because the steps being run are quite crucial. In this case there is no notification so that’s why I used that word. It’s making a build but the normal notification channels (as far as I can see) are not being pinged. So it’s not making any noise if you see what I mean. But you can go and look, as you say.

Thanks so much for clarifying, this makes perfect sense. A quick look at the code and I think we just need to send the status and webhook in the createBuildError function (at the link below). We will have our team dig deeper and will report back here if we have any questions.