Do you have a remote proxy in place and have you considered adjusting the configuration to avoid returning html for json requests?
This seems to be an edge case related to using a reverse proxy, because I cannot otherwise reproduce the issue. When the server is down, and no reverse proxy is in place, the http request times out after 30 seconds and then re-attempts. This limits the logs to once every N seconds.
hmm, it does not look like this response is coming from the Drone server. We do not have any code that returns a response matching this output:
Yes, it’s Nginx before Drone’s server.
Have you considered adjusting the configuration to avoid returning html for json requests?
We’ll do. But before this incident we didn’t even know about such problem.
When the server is down, and no reverse proxy is in place, the http request times out after 30 seconds and then re-attempts. This limits the logs to once every N seconds.
I’m sorry, but what does it explain? We have a situation, when HTTP client got non-2xx response and flooded all logs (~9k messages for 30sec). This behavior is clearly destructive and in short time lead to downtime. I.e. it’s at least a bug worth fixing.
For a more permanent solution you might consider sending a patch that accounts for html responses returned from a reverse proxy when the server is down.
In case of code written in Golang we can help only with reporting and troubleshooting. Sorry, but it’s all how we can be useful for now
This seems to be an edge case related to using a reverse proxy, because I cannot otherwise reproduce the issue. When the server is down, and no reverse proxy is in place, the http request times out after 30 seconds and then re-attempts. This limits the logs to once every N seconds.
This abused not only client-side (where was runner), but also a server-side (where was server). We got ~140Gb of Nginx’s access logs from this “edge case”.
For a more permanent solution you might consider sending a patch that accounts for html responses returned from a reverse proxy when the server is down.
And why we’re talking about a body of the response when we got 403? I.e. we got non-success status code, which means that our request is failed. Why content of the response’s body even matter in this case?
As I mentioned, I was unable to reproduce the issue but perhaps I am misunderstanding or perhaps this is a difference in setup. I recommend patching the code to solve the issue and sending a pull request. You can find the relevant source here. We will evaluate the pull request and provide any feedback during the code review process.
As I mentioned, I was unable to reproduce the issue but perhaps I am misunderstanding or perhaps this is a difference in setup.
As I wrote above, we’re happy to help with debugging or any other additional information that may or would be needed.
I recommend patching the code to solve the issue and sending a pull request.
Also, as I wrote before, this isn’t an option. We definitely would if we could. But right now we don’t have anyone around who can write in Go.
PS: BTW, we removed HTML part from the answer in favor of 403 with ‘{}’ in body (still valid JSON?). It didn’t change a bit - exactly the same flood in logs.
The 403 status code is only returned when the server is configured when an empty rpc secret. You can find the code here with the relevant snippet pasted below for your convenience.
// prevents system administrators from accidentally
// exposing drone without credentials.
if token == "" {
w.WriteHeader(403)
}
What I do not understand is how the runner was able to establish a connection with the server in the first place. The first thing the runner does is ping the server. The runner blocks until the ping is successful. If the server is configured with an empty rpc secret, the ping will fail with a 403 and the runner will eventually exit after multiple attempts.
Either way this seems to be a configuration issue. The root cause is the server was configured with an empty rpc secret.