I have a rather simple setup with 1 drone server, 4 agents and 1 dind (Docker-in-Docker) service. Basically all of them are running in a Kubernetes cluster as pods (1 pod for server, 1 pod for 4 agents and dind).
In general 4 agents are capable to handle concurrent builds but it’s not the case when at least 2 of them are using ECR plugin(plugins/ecr) to build docker images. In such cases only one agent is building image, others reporting “Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?”.
My assumption is that it’s happening because ultimately dind is building images and he’s not capable to use the same docker socket for few builds at the same time. Could you please help me resolving this issue?
Here is my yaml config for agents and dind:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: drone-agent
namespace: drone
spec:
replicas: 1
template:
metadata:
labels:
app: drone-agent
spec:
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
containers:
- image: drone/agent:1.2.1
imagePullPolicy: Always
name: drone-agent
volumeMounts:
- name: docker-socket
mountPath: /var/run/docker.sock
resources:
requests:
cpu: 50m
memory: 64Mi
env:
- name: DRONE_RPC_SERVER
value: http://drone.drone.svc.cluster.local.
- name: DRONE_RPC_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: DRONE_RPC_SECRET
- name: DRONE_KEEPALIVE_MIN_TIME
value: "5s"
- name: DRONE_MAX_PROCS
value: "3"
- name: DRONE_AGENT_CONCURRENCY
value: "3"
- name: DRONE_LOGS_DEBUG
value: "false"
- name: DRONE_LOGS_TRACE
value: "true"
- name: DRONE_LOGS_PRETTY
value: "true"
- name: DOCKER_HOST
value: tcp://localhost:2375
- image: drone/agent:1.2.1
imagePullPolicy: Always
name: drone-agent-2
volumeMounts:
- name: docker-socket
mountPath: /var/run/docker.sock
resources:
requests:
cpu: 50m
memory: 64Mi
env:
- name: DRONE_RPC_SERVER
value: http://drone.drone.svc.cluster.local.
- name: DRONE_RPC_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: DRONE_RPC_SECRET
- name: DRONE_KEEPALIVE_MIN_TIME
value: "5s"
- name: DRONE_MAX_PROCS
value: "3"
- name: DRONE_AGENT_CONCURRENCY
value: "3"
- name: DRONE_LOGS_DEBUG
value: "false"
- name: DRONE_LOGS_TRACE
value: "false"
- name: DRONE_LOGS_PRETTY
value: "true"
- name: DOCKER_HOST
value: tcp://localhost:2375
- image: drone/agent:1.2.1
imagePullPolicy: Always
name: drone-agent-3
volumeMounts:
- name: docker-socket
mountPath: /var/run/docker.sock
resources:
requests:
cpu: 50m
memory: 64Mi
env:
- name: DRONE_RPC_SERVER
value: http://drone.drone.svc.cluster.local.
- name: DRONE_RPC_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: DRONE_RPC_SECRET
- name: DRONE_KEEPALIVE_MIN_TIME
value: "5s"
- name: DRONE_MAX_PROCS
value: "3"
- name: DRONE_AGENT_CONCURRENCY
value: "3"
- name: DRONE_LOGS_DEBUG
value: "false"
- name: DRONE_LOGS_TRACE
value: "false"
- name: DRONE_LOGS_PRETTY
value: "true"
- name: DOCKER_HOST
value: tcp://localhost:2375
- image: drone/agent:1.2.1
imagePullPolicy: Always
name: drone-agent-4
volumeMounts:
- name: docker-socket
mountPath: /var/run/docker.sock
resources:
requests:
cpu: 50m
memory: 64Mi
env:
- name: DRONE_RPC_SERVER
value: http://drone.drone.svc.cluster.local.
- name: DRONE_RPC_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: DRONE_RPC_SECRET
- name: DRONE_KEEPALIVE_MIN_TIME
value: "5s"
- name: DRONE_MAX_PROCS
value: "3"
- name: DRONE_AGENT_CONCURRENCY
value: "3"
- name: DRONE_LOGS_DEBUG
value: "false"
- name: DRONE_LOGS_TRACE
value: "false"
- name: DRONE_LOGS_PRETTY
value: "true"
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: dind
image: "docker.io/library/docker:18.06-dind"
imagePullPolicy: IfNotPresent
env:
- name: DOCKER_DRIVER
value: overlay2
securityContext:
privileged: true
volumeMounts:
- name: docker-volumes-cache
mountPath: /cache
tolerations:
- key: "purpose"
operator: "Equal"
value: "system"
effect: "NoSchedule"
volumes:
- name: docker-socket
hostPath:
path: /var/run/docker.sock
- name: docker-volumes-cache
persistentVolumeClaim:
claimName: drone-volumes-cache
By the way, here I saw that dind is not required anymore but if to remove dind and change DOCKER_HOST to tcp://172.17.0.1:2375 all builds will end up in Pending state because agents not able to connect to this host