I had a working drone server running in docker on NixOS. The configuration of the server is:
systemd.services.pubdrone = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
description = "Drone.io CD-Server for public projects";
serviceConfig = {
Type = "oneshot";
RemainAfterExit = "yes";
ExecStart = ''${pkgs.docker}/bin/docker run \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/lib/drone:/data \
--env=DRONE_GIT_ALWAYS_AUTH=false \
--env=DRONE_GITLAB_SERVER=https://gitlab.com \
--env=DRONE_GITLAB_CLIENT_ID=<hidden> \
--env=DRONE_GITLAB_CLIENT_SECRET=<hidden> \
--env=DRONE_SERVER_HOST=pub.drone.amessage.eu \
--env=DRONE_SERVER_PROTO=https \
--env=DRONE_TLS_AUTOCERT=true \
--env=DRONE_USER_CREATE=username:<hidden>,machine:false,admin:true \
--env=DRONE_DATABASE_SECRET=<hidden> \
--env=DRONE_RPC_SECRET=<hidden> \
--publish=8081:80 \
--restart=always \
--detach=true \
--name=pubdrone \
drone/drone:1'';
ExecStop = ''${pkgs.docker}/bin/docker stop pubdrone'';
ExecStopPost = ''${pkgs.docker}/bin/docker rm -f pubdrone'';
};
};
The runner (on a different host is):
systemd.services.pubdronerun = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
description = "Drone.io Runner";
serviceConfig = {
Type = "oneshot";
RemainAfterExit = "yes";
ExecStart = ''${pkgs.docker}/bin/docker run \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/lib/drone:/data \
--env=DRONE_RPC_HOST=pub.drone.amessage.eu \
--env=DRONE_RPC_PROTO=https \
--env=DRONE_RPC_SECRET=<hidden> \
--env=DRONE_RUNNER_CAPACITY=2 \
--env=DRONE_RUNNER_NAME=nahe.amessage.eu \
--restart=always \
--detach=true \
--name=pubdronerun \
drone/drone-runner-docker:1'';
ExecStop = ''${pkgs.docker}/bin/docker stop pubdronerun'';
ExecStopPost = ''${pkgs.docker}/bin/docker rm -f pubdronerun'';
};
};
In the log file of the runner I get the following:
time=“2020-03-26T11:29:31Z” level=info msg=“starting the server” addr=":3000"
time=“2020-03-26T11:29:31Z” level=info msg=“successfully pinged the remote server”
time=“2020-03-26T11:29:31Z” level=info msg=“polling the remote server” arch=amd64 capacity=2 endpoint=“https://pub.drone.amessage.eu” kind=pipeline os=linux type=docker
On the server I get nothing. Only the following if I restart the container:
{“level”:“info”,“msg”:“main: internal scheduler enabled”,“time”:“2020-03-26T15:48:27Z”}
{“acme”:true,“host”:“pub.drone.amessage.eu”,“level”:“info”,“msg”:“starting the http server”,“port”:":443",“proto”:“https”,“time”:“2020-03-26T15:48:27Z”,“url”:“https://pub.drone.amessage.eu”}
{“interval”:“30m0s”,“level”:“info”,“msg”:“starting the cron scheduler”,“time”:“2020-03-26T15:48:27Z”}
Any idea on how I could further debug this problem? Any obvious configuration error?
In the job queue there are 4 builds to be done and they don’t get executed for 14 days now.
Okay, this is the request from the runner to the server:
POST /rpc/v2/stage HTTP/1.0
Host: pub.drone.amessage.eu
X-Real-IP: 178.27.181.13
X-Forwarded-For: 178.27.181.13
X-Forwarded-Proto: https
X-Forwarded-Host: pub.drone.amessage.eu
X-Forwarded-Server: pub.drone.amessage.eu
Connection: close
Content-Length: 89
x-drone-token: <hidden>
user-agent: Go-http-client/2.0
{"kind":"pipeline","type":"docker","os":"linux","arch":"amd64","variant":"","kernel":""}
The server responds with:
HTTP/1.0 204 No Content
Cache-Control: no-cache, no-store, must-revalidate, private, max-age=0
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Pragma: no-cache
X-Accel-Expires: 0
Date: Thu, 26 Mar 2020 16:22:47 GMT
Why isn’t the server returning the pending builds?
Update: After canceling the 4 pending builds and restarting them afterwards 3 of them got built while one keeps staying in the pending state. Three actually corresponds the the sum of the configured capacity of the runners. So it seems after running one job, the runners don’t get a new one.
Thanks in advance
Matthias