Drone Kubernetes Job Failure "cannot get stage details"

Hi,

I’m trying to set up a new production environment, using Kubernetes.

Drone seems like the product of choice when setting up a new CI/CD pipeline.
I’ve set up drone with kubernetes and although I know this is an alpha/beta feature I was wondering if I could get some support here.

I’ve set up Drone and have a project set up to build in drone.
But every time I try to start a new pipeline, the kubernetes job that gets triggered errors like so.

Error:
{"arch":"amd64","level":"warning","machine":"ip-10-0-11-202.ap-southeast-2.compute.internal","msg":"runner: cannot get stage details","os":"linux","stage-id":1,"time":"2019-01-21T04:21:01Z"}
{"error":"Forbidden","level":"warning","msg":"program terminated","time":"2019-01-21T04:21:01Z"}

I followed the basic set up on the docs website.
Does anyone know what I’m missing?

Thanks,
Nick

Hi Nick,

Drone creates a Kubernetes job for each build, which consists of a single-user controller container that orchestrates the build pipeline. One of the first thing that the controller does is contact the main Drone server to try and fetch addition details required for execution. This error message may indicate the controller is unable to communicate with the server.

This is something we should be able to easily verify. If the controller is successfully communicating with the server we would see this message printed in the logs:

{
  "level": "debug",
  "msg": "manager: fetching stage details",
  "step-id": 1,
  "time": "2019-01-17T17:19:35-08:00"
}

Can you please check the server logs and let me know if you see anything?

Here are my server logs.

{"build-id":3,"level":"debug","msg":"kubernetes: creating job","repo-id":165,"stage-id":3,"stage-name":"build-hello-hapi","stage-number":1,"time":"2019-01-21T07:39:21Z"}
{"build-id":3,"level":"debug","msg":"kubernetes: successfully created job","repo-id":165,"stage-id":3,"stage-name":"build-hello-hapi","stage-number":1,"time":"2019-01-21T07:39:21Z"}
{"fields.time":"2019-01-21T07:39:21Z","latency":1312831421,"level":"debug","method":"POST","msg":"","remote":"10.45.192.1:47386","request":"/hook","request-id":"1G4EEiCvS5xPBed27ZILVCZDsHt","time":"2019-01-21T07:39:21Z"}

I dont see anything that says fetching state details.

I’m wondering if it has something to do with my DRONE_RPC_HOST?
should it be set to the name of the drone service in kubernetes?
I have it set to the same DNS as my drone website eg. “drone.mywebsite.com

You know what?

I might not be hitting my drone server because I have an IP whitelist set up on my ingress.
I’ll set up an internal passthrough on dns “drone-service” and see how I go.

Thanks for your help :slight_smile:

Thanks for checking the server logs. It definitely sounds like a networking issue, with the controller unable to reach the server. The controller will attempt to contact the server using the address specified in the following environment variables:

DRONE_SERVER_HOST=some.address.com
DRONE_SERVER_PROTO=http

In some cases, you may want to use an internal address for communication. The following environment variables can be used to override and customize the address:

DRONE_RPC_HOST=some.internal.address:80  # you can specify port, if necessary
DRONE_RPC_PROTO=http

I recommend exploring the above configuration parameters to see if you can find an address and protocol setting that are able to successfully contact the Drone server.

If that does not work you can enable trace logging which will expose detailed controller logs for the controller-to-server communications. This should only be enabled temporarily and should be disabled once troubleshooting is complete.

DRONE_LOGS_TRACE=true

I hope this helps. If you find a solution please post back so that others may benefit :slight_smile:

Thanks for your very speedy responses!

Your suggestion about the port number fixed my issue entirely, I should have specified my port number as 8080. Setting my RPC_PROTO to http helped too haha

Thanks for your help!

my drone config is

DRONE_KUBERNETES_ENABLED: true
DRONE_KUBERNETES_NAMESPACE: default
DRONE_GITHUB_SERVER: https://github.comgithub.com
DRONE_GITHUB_CLIENT_ID: ************
DRONE_GITHUB_CLIENT_SECRET: ******************
DRONE_SERVER_HOST: my_domain
DRONE_SERVER_PROTO: http
DRONE_RPC_HOST: drone-server-service
DRONE_RPC_PROTO: http
DRONE_DATABASE_DRIVER: sqlite3
DRONE_DATABASE_DATASOURCE: /drone/drone.sqlite
DRONE_USER_CREATE: username:cuijxin,admin:true
DRONE_SECRET_SECRET: ************
DRONE_SECRET_ENDPOINT: http://drone-secrets-service
DRONE_LOGS_TRACE: true

and drone server’s log
drone server 日志:“{“build-id”:1,“level”:“debug”,“msg”:“kubernetes: creating job”,“repo-id”:8,“stage-id”:1,“stage-name”:“deploy”,“stage-number”:1,“time”:“2019-07-11T02:42:52Z”}
{“build-id”:1,“level”:“debug”,“msg”:“kubernetes: successfully created job”,“repo-id”:8,“stage-id”:1,“stage-name”:“deploy”,“stage-number”:1,“time”:“2019-07-11T02:42:52Z”}
{“fields.time”:“2019-07-11T02:42:52Z”,“latency”:3003858571,“level”:“debug”,“method”:“POST”,“msg”:"",“remote”:“10.32.0.6:47350”,“request”:"/hook",“request-id”:“1NqeHuyaEnnggoyy3PthcodAdqv”,“time”:“2019-07-11T02:42:52Z”}”

job’s log
“2019/07/11 02:42:57 [DEBUG] POST http://drone-server-service/rpc/v1/details
{“arch”:“amd64”,“error”:"",“level”:“warning”,“machine”:“master”,“msg”:“runner: cannot get stage details”,“os”:“linux”,“stage-id”:1,“time”:“2019-07-11T02:43:02Z”}
{“error”:"",“level”:“warning”,“msg”:“program terminated”,“time”:“2019-07-11T02:43:02Z”}”