"stopped" agents remain in drone server ls

I observe multiple agents are stuck in stopped status, and cannot delete even with drone server destroy --force.

👉 bin/drone server ls -a -H -l
Name              Address        State      Created
agent-DnFRUDrY    10.0.12.148    stopped    10 hours ago
agent-CQBQ10mi    10.0.12.208    stopped    10 hours ago
agent-Q9rxIhb3    10.0.12.250    stopped    9 hours ago
agent-iNblnE3h    10.0.12.132    stopped    8 hours ago
agent-jbwphpaw    10.0.12.51     stopped    7 hours ago
agent-pxJXYnsp    10.0.12.206    stopped    7 hours ago
agent-Zj0FPMeC    10.0.12.108    stopped    6 hours ago
agent-SNpn3AAk    10.0.12.122    stopped    6 hours ago
agent-DDuEO8V7    10.0.12.174    stopped    6 hours ago
agent-sjaBShDY    10.0.12.76     stopped    6 hours ago
agent-2Pccjm7k    10.0.12.210    stopped    5 hours ago
agent-fA8Jf6QC    10.0.12.32     stopped    5 hours ago
agent-LjqZQjFL    10.0.12.33     stopped    5 hours ago
agent-MmBZUozk    10.0.12.110    running    2 hours ago
agent-qZ1oW4kw    10.0.12.152    stopped    2 hours ago
agent-QO6awaAd    10.0.12.237    stopped    2 hours ago
agent-ILClFdZI    10.0.12.108    stopped    2 hours ago
agent-JLPdPr5A    10.0.12.249    stopped    2 hours ago
agent-TvZhvFyu    10.0.12.75     running    About an hour ago
agent-F8ChaoYK    10.0.12.226    stopped    About an hour ago

I grep’ed the autoscaler log for agent agent-DnFRUDrY. At the end of the log, I tried to kill the agent but it failed with the message “server no longer exists. nothing to destroy”.

Why do they remain stopped status? Is there a way to clean them up?

Here is the configuration of autoscaler. Please note I added DRONE_ENABLE_REPAIR=true following the advice Drone-agent in error status

sudo docker run -d \
  -e DRONE_POOL_MIN=0 \
  -e DRONE_POOL_MAX=20 \
  -e DRONE_CAPACITY_BUFFER=0 \
  -e DRONE_POOL_MIN_AGE=30m \
  -e DRONE_INTERVAL=10s \
  -e DRONE_SERVER_PROTO=https \
  -e DRONE_INSTALL_CHECK_INTERVAL=60s \
  -e DRONE_ENABLE_REAPER=true \
  -e DRONE_AMAZON_IMAGE=ami-***** \
  -e DRONE_SERVER_HOST=drone-ci.******* \
  -e DRONE_SERVER_TOKEN=****** \
  -e DRONE_AGENT_CONCURRENCY=12 \
  -e DRONE_AGENT_TOKEN=******** \
  -e DRONE_AMAZON_REGION=us-west-2 \
  -e DRONE_AMAZON_SUBNET_ID=subnet-***** \
  -e DRONE_AMAZON_SECURITY_GROUP=sg-****** \
  -e DRONE_AMAZON_INSTANCE=c5.9xlarge \
  -e DRONE_AMAZON_SSHKEY=drone \
  -e DRONE_AMAZON_TAGS=type:drone-agent \
  -e AWS_ACCESS_KEY_ID=***** \
  -e AWS_SECRET_ACCESS_KEY=****** \
  -e DRONE_AMAZON_PRIVATE_IP=true \
  -p 8080:8080 \
  --restart=always \
  --name=autoscaler \
  drone/autoscaler

Here is the except of the log from docker logs autoscaler. I squashed consecutive outputs.

{"level":"debug","region":"us-west-2","image":"ami-**********","size":"c5.9xlarge","name":"agent-DnFRUDrY","time":"2019-10-07T13:16:54Z","message":"instance create"}
{"level":"info","region":"us-west-2","image":"ami-**********","size":"c5.9xlarge","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","time":"2019-10-07T13:16:54Z","message":"instance create success"}
{"level":"debug","region":"us-west-2","image":"ami-**********","size":"c5.9xlarge","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","time":"2019-10-07T13:16:54Z","message":"check instance network"}
{"level":"debug","region":"us-west-2","image":"ami-**********","size":"c5.9xlarge","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","ip":"10.0.12.148","time":"2019-10-07T13:16:54Z","message":"instance network ready"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T13:16:54Z","message":"provisioned server"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","time":"2019-10-07T13:17:02Z","message":"check docker connectivity"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","time":"2019-10-07T13:17:02Z","message":"connecting to docker"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","error":"Cannot connect to the Docker daemon at https://10.0.12.148:2376. Is the docker daemon running?","name":"agent-DnFRUDrY","time":"2019-10-07T13:17:34Z","message":"cannot connect, retry in 1m0s"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","name":"agent-DnFRUDrY","time":"2019-10-07T13:18:34Z","message":"connecting to docker"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","image":"drone/agent:1","time":"2019-10-07T13:18:34Z","message":"pull docker image"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","image":"drone/agent:1","time":"2019-10-07T13:18:36Z","message":"create agent container"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","image":"drone/agent:1","time":"2019-10-07T13:18:36Z","message":"start the agent container"}
{"level":"debug","ip":"10.0.12.148","name":"agent-DnFRUDrY","image":"drone/agent:1","time":"2019-10-07T13:18:37Z","message":"agent container started"}
{"level":"debug","id":"M5IYIZ9F4Edq5bsq","server":"agent-DnFRUDrY","time":"2019-10-07T13:23:05Z","message":"server is busy"}
...
{"level":"debug","id":"UbYimZqr1mTZvtNG","server":"agent-DnFRUDrY","time":"2019-10-07T13:28:05Z","message":"server is busy"}
{"level":"debug","id":"wXo1GZnRLtqxUzbc","server":"agent-DnFRUDrY","age":691833.715306,"min-age":1800000,"time":"2019-10-07T13:28:15Z","message":"server min-age not reached"}
...
{"level":"debug","id":"EvIYmYrRFWm3Htez","server":"agent-DnFRUDrY","age":1392959.755046,"min-age":1800000,"time":"2019-10-07T13:39:56Z","message":"server min-age not reached"}
{"level":"debug","id":"uTuixOnbu64rOSaG","server":"agent-DnFRUDrY","time":"2019-10-07T13:43:37Z","message":"server is busy"}
...
{"level":"debug","id":"mcUApEgSXmA2gKhb","server":"agent-DnFRUDrY","time":"2019-10-07T13:44:27Z","message":"server is busy"}
...
{"level":"debug","id":"n1webanNSJcljaY3","server":"agent-DnFRUDrY","time":"2019-10-07T13:46:47Z","message":"server is idle"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T13:46:52Z","message":"destroying server"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T13:46:53Z","message":"terminate instance"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T13:46:53Z","message":"terminated"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T13:46:53Z","message":"destroyed server"}
{"level":"debug","ip":"10.0.2.218","path":"/api/servers/agent-DnFRUDrY","method":"DELETE","request_id":"bmdrrabc0uvqfbtt7u40","username":"k2n","username":"k2n","time":"2019-10-07T22:35:21Z","message":"user authorized"}
{"level":"debug","ip":"10.0.1.142","path":"/api/servers/agent-DnFRUDrY","method":"GET","request_id":"bmdrrajc0uvqfbtt7u4g","username":"k2n","username":"k2n","time":"2019-10-07T22:35:22Z","message":"user authorized"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T22:35:24Z","message":"destroying server"}
{"level":"debug","ip":"10.0.1.142","path":"/api/servers/agent-DnFRUDrY?force=true","method":"DELETE","request_id":"bmdrs7rc0uvqfbtt7uu0","username":"k2n","username":"k2n","time":"2019-10-07T22:37:19Z","message":"user authorized"}
{"level":"debug","ip":"10.0.2.218","path":"/api/servers/agent-DnFRUDrY","method":"GET","request_id":"bmdrs7rc0uvqfbtt7uug","username":"k2n","username":"k2n","time":"2019-10-07T22:37:19Z","message":"user authorized"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T22:37:25Z","message":"destroying server"}
{"level":"warn","error":"Cannot connect to the Docker daemon at https://10.0.12.148:2376. Is the docker daemon running?","server":"agent-DnFRUDrY","time":"2019-10-07T22:37:36Z","message":"cannot stop the agent"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:37:36Z","message":"terminate instance"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:37:36Z","message":"instance does not exist"}
{"level":"info","state":"error","server":"agent-DnFRUDrY","time":"2019-10-07T22:37:36Z","message":"server no longer exists. nothing to destroy"}
{"level":"warn","error":"Cannot connect to the Docker daemon at https://10.0.12.148:2376. Is the docker daemon running?","server":"agent-DnFRUDrY","time":"2019-10-07T22:39:34Z","message":"cannot stop the agent"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:39:34Z","message":"terminate instance"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:39:34Z","message":"instance does not exist"}
{"level":"info","state":"error","server":"agent-DnFRUDrY","time":"2019-10-07T22:39:34Z","message":"server no longer exists. nothing to destroy"}
{"level":"debug","ip":"10.0.2.218","path":"/api/servers/agent-DnFRUDrY","method":"DELETE","request_id":"bmdrtubc0uvqfbtt7vbg","username":"k2n","username":"k2n","time":"2019-10-07T22:40:57Z","message":"user authorized"}
{"level":"debug","ip":"10.0.1.142","path":"/api/servers/agent-DnFRUDrY","method":"GET","request_id":"bmdrtubc0uvqfbtt7vc0","username":"k2n","username":"k2n","time":"2019-10-07T22:40:57Z","message":"user authorized"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T22:41:05Z","message":"destroying server"}
{"level":"debug","ip":"10.0.2.218","path":"/api/servers/agent-DnFRUDrY","method":"DELETE","request_id":"bmdruk3c0uvqfbtt7vgg","username":"k2n","username":"k2n","time":"2019-10-07T22:42:24Z","message":"user authorized"}
{"level":"debug","ip":"10.0.1.142","path":"/api/servers/agent-DnFRUDrY","method":"GET","request_id":"bmdruk3c0uvqfbtt7vh0","username":"k2n","username":"k2n","time":"2019-10-07T22:42:24Z","message":"user authorized"}
{"level":"debug","server":"agent-DnFRUDrY","time":"2019-10-07T22:42:25Z","message":"destroying server"}
{"level":"warn","error":"Cannot connect to the Docker daemon at https://10.0.12.148:2376. Is the docker daemon running?","server":"agent-DnFRUDrY","time":"2019-10-07T22:43:16Z","message":"cannot stop the agent"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:43:16Z","message":"terminate instance"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:43:16Z","message":"instance does not exist"}
{"level":"info","state":"error","server":"agent-DnFRUDrY","time":"2019-10-07T22:43:16Z","message":"server no longer exists. nothing to destroy"}
{"level":"warn","error":"Cannot connect to the Docker daemon at https://10.0.12.148:2376. Is the docker daemon running?","server":"agent-DnFRUDrY","time":"2019-10-07T22:44:35Z","message":"cannot stop the agent"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:44:35Z","message":"terminate instance"}
{"level":"debug","id":"i-00aec8d18ed1422f8","ip":"10.0.12.148","name":"agent-DnFRUDrY","zone":"us-west-2a","time":"2019-10-07T22:44:35Z","message":"instance does not exist"}
{"level":"info","state":"error","server":"agent-DnFRUDrY","time":"2019-10-07T22:44:35Z","message":"server no longer exists. nothing to destroy"}

I observe multiple agents are stuck in stopped status,

The agents are not stuck … stopped agents are purged from the database every 24 hours [1]. The records are kept in the database for 24 hours for audit trail purposes. Stopped instances are excluded from capacity calculations [2].

If you do not want to see them in the results, you can omit -a

[1] https://github.com/drone/autoscaler/blob/master/engine/engine.go#L242:L258
[2] https://github.com/drone/autoscaler/blob/master/engine/planner.go#L243

1 Like