Hello,
as the drone-runner-kube has been deprecated we switched to drone-runner-docker that we are running in an EKS cluster. For the deployment we use your official helm chart
helm upgrade drone drone/drone -f ./drone/server.values.yaml --namespace drone --version 0.6.4 --install
helm upgrade drone-runner-docker drone/drone-runner-docker -f ./drone/runner.values.yaml --namespace drone --version 0.6.1 --install
helm upgrade drone-kubernetes-secrets drone/drone-kubernetes-secrets -f ./drone/kubernetes-secrets.values.yaml --namespace drone --version 0.1.4 --install
The first thing we noticed it that it uses much more resources than the kube-runner (the dind container uses sometimes more than 2GB of RAM and the CPU usages goes up to more than 3500). We are running 3 runner pods each on a t3a.xlarge instance (4 CPUs and 16 GB of RAM).
Here is the problem:
The drone-runner-docker container crashes quite a lot and the logs say
drone-runner-docker time="2023-01-15T02:40:20Z" level=info msg="starting the server" addr=":3000"
drone-runner-docker "cannot ping the remote server" error="Post \"http://drone.drone.svc.cluster.local:8080/rpc/v2/ping\": dial tcp: i/o timeout"
received signal, terminating process
and in the Kubernetes events we see
Readiness probe failed: dial tcp 172.19.4.183:3000: connect: connection refused
The IP belongs to the drone-runner-docker container. So, it runs just fine for some time and then it crashes like 50 times in a row and starts to work again. Please find some Grafana graphs attached. The crashes seem to occur randomly as they are not correlated to a high workload.
Does somebody know how to fix it?
Best wishes
Philipp