<buildname> Error response from daemon: network <network_id> not found

Anyone have a better solution than terminating the agent which showed this error? In our experience, once you see this, it starts happening more frequently until that agent is useless and must be replaced. (our agents are in an ASG, so not a huge deal, but it’s a pain).

This time it took 13 days from boot to first incident of error.

Back of the envelope math from daily build number averages (aka inaccurate) would be after around 200 builds on that agent.

for reference, here’s the drone server docker compose yaml:

version: '2'

services:
  drone-server:
    image: ${DRONE_DOCKER_IMAGE}
    ports:
      - 443:443
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /home/drone/drone_certs/drone-cert-cert.pem:/etc/certs/drone.sillyu.io/drone-cert-cert.pem
      - /home/drone/drone_certs/drone-cert-key.pem:/etc/certs/drone.sillyu.io/drone-cert-key.pem
    restart: always
    environment:
      - AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}
      - AWS_REGION=${AWS_DEFAULT_REGION}
      - DRONE_DATABASE_DRIVER=${DRONE_RDS_ENGINE_NAME}
      - DRONE_DATABASE_DATASOURCE=root:${DRONE_RDS_PASSWORD}@tcp(${DRONE_RDS_HOSTNAME})/drone?parseTime=true
      - DRONE_DATABASE_SECRET=${DRONE_DATABASE_SECRET}
      - DRONE_GITHUB_SERVER=https://github.atl.sillyu.net
      - DRONE_GITHUB_SKIP_VERIFY=true
      - DRONE_GITHUB_MERGE_REF=false                          # https://github.com/drone-plugins/drone-git/issues/41
      - DRONE_GITHUB_CLIENT_ID=${DRONE_GITHUB_CLIENT}
      - DRONE_GITHUB_CLIENT_SECRET=${DRONE_GITHUB_SECRET}
      - DRONE_GIT_ALWAYS_AUTH=true
      - DRONE_LOGS_DEBUG=true
      - DRONE_RPC_SECRET=${DRONE_RPC_SECRET}
      - DRONE_RUNNER_CAPACITY=0
      - DRONE_SERVER_HOST=${DRONE_ALB_HOSTNAME}
      - DRONE_SERVER_PROTO=https
      - DRONE_S3_BUCKET=${DRONE_S3_BUCKET}
      - DRONE_TLS_CERT=/etc/certs/drone.sillyu.io/drone-cert-cert.pem
      - DRONE_TLS_KEY=/etc/certs/drone.sillyu.io/drone-cert-key.pem
      - DRONE_USER_CREATE=username:drone,machine:true,admin:true,token:${DRONE_TOKEN}
      - DRONE_USER_FILTER=group
  watchtower:
    image: v2tec/watchtower
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

and the drone_agent docker compose file:

version: '2'

services:
  drone-agent-alpha:
    image: ${DRONE_DOCKER_AGENT_IMAGE}
    command: agent
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - DRONE_LOGS_DEBUG=true
      - DRONE_RPC_SERVER=https://${DRONE_ALB_HOSTNAME}:443
      - DRONE_RPC_SECRET=${DRONE_RPC_SECRET}
      - DRONE_RUNNER_CAPACITY=1
      - DRONE_RUNNER_NAME=${ETH0IP}

FYI this error originates in the the docker daemon, not in Drone, and could indicate a problem with Docker. We have limited ability to troubleshoot Docker issues, however, these are some things I would research:

  • check your Docker daemon logs for errors
  • check the Docker resources on the host. is there a buildup of resources?
  • do you need to run a prune?
  • do you need to upgrade to a newer version of Docker?
  • are you allowing developers to mount the host machine socket and interact with Docker directly? Is it possible they are not cleaning up after themselves?
  • are you using a tool that tries to periodically cleanup Docker resources? These tools are often too aggressive and remove networks before Drone is done using them.

FWIW this has not been a problem for us at cloud.drone.io.

Ok. Next time it occurs, I’ll do that research.

Dumb question: how can I determine which agent a given drone pipeline was running on?

Followup: Seems like a drone system prune -f which occurs right at the beginning of a build stage step starting can trigger this behavior.

It triggered (by cron) at 7:00:01 am.

Build stage failed at 7:00:04 am.

We’re going to modify the cron job to be a little safer.

You might also consider using drone-gc

This handles pruning images, containers and volumes created by Drone. This does not prune objects that your users create out-of-band by mounting the host machine Docker socket in their pipeline, so that is the only caveat.