Getting the following error when attempting to build, but not every time. I haven’t figured out the pattern yet. Here is the drone configuration I am using, maybe there is some misconfiguration in it?
I think this would be the next step. This error message implies that either Drone was unable to connect with Docker or the connection was reset or broken. Either way, this would need to be debugged at the Docker daemon level.
For example, if Docker crashes and restarts due to an internal error, it would break the connection resulting in a connection error https://github.com/moby/moby/issues/38735
Also make sure you are running the latest version of Docker and that you restart your Drone agent after you upgrade Docker.
Just a quick side note (if this ever happens to anyone else). I changed the Drone configuration back to use the Kubernetes configuration and that seems to work.
I am still digging into what the problem might be with the normal drone-agent configuration.
Upgraded Docker to 18.09.3 (on the hosts) and upgraded the docker/dind image to 18.06-dind but still seeing the same Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? error, even though the dind container is saying it is connected to the Docker socket and I don’t see any other errors in the dind logs.
If you are going to use Agents I recommend installing the agents on standalone virtual machines (e.g. aws instance). It will probably take just a few minutes to setup, compared to hours or days of debugging Kubernetes.
At this point the debugging is pretty much just to scratch my own itch of getting it working. The Kubernetes integration works well so I can run with that for now and look at getting some stand alone agents set up later.
On a side note, do you know if there is a way to set a TTL on the Kubernetes jobs so that they get cleaned up automatically after finishing?
I do remember reports of a race condition where if the agent starts before Docker, that the connection is never established. This happens if you use a Docker-in-Docker container in the same pod. So that is something worth looking into …
At this point the debugging is pretty much just to scratch my own itch of getting it working. The Kubernetes integration works well so I can run with that for now and look at getting some stand alone agents set up later.
For more context, the reason I am not a huge fan of running Agents on Kubernetes is because the Drone resource scheduler and Kubernetes resource scheduler conflict with each other. Also Kuberetes does weird things with networking and does not always work well with user-defined Docker networks. I have seen people waste a lot of time dealing with these issues.
The native Kubernetes runtime avoids many of the issues but it is still very experimental. It runs quite well on some Kubernetes distros (DigitalOcean, MiniKube) and quite poorly on others (EKS, OpenShift) and in some cases (GKE) varies depending on version and container runtime (containerd, docker, etc). Eventually this will be the recommended way to run pipelines on Kubernetes, and one day, it may be the only way. I could definitely see disabling agents from running on Kubernetes entirely in favor of the native Kubernetes runtime.
On a side note, do you know if there is a way to set a TTL on the Kubernetes jobs so that they get cleaned up automatically after finishing?
[docker:53] time="2019-06-07T12:02:20.398818846Z" level=warning msg="insecure registry https://index.docker.io/v1/ should not contain 'https://' and 'https://' has been removed from the insecure registry config"
[docker:54] time="2019-06-07T12:02:20.399036736Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
[docker:55] Error starting daemon: insecure registry index.docker.io/v1/ is not valid: invalid host "index.docker.io/v1/"
So apparently, having insecure: true would strip off https:// from default registry and then fail saying index.docker.io/v1/ is a invalid registry.
I assumed, repo: docker-registry.default:5000/backend would override the registry to docker-registry.default:5000. So, i updated my .drone.yml as following