I’m testing out drone-autoscaler and and running into a cert error. The autoscaler is able to provision an ec2 instance however it always fails to connect because of a mismatch of agent names. All new builds, new repo or new job in existing repo have the same issue.
drone-autoscaler {"error":"error during connect: Get \"https://10.176.4.74:2376/v1.24/containers/json?limit=0\": x509: certificate is valid for agent-P428IeKe, not agent-cbneAY52","ip":"10.176.4.74","level":"debug","msg":"cannot connect, retry in 1m0s","name":"agent-cbneAY52","time":"2021-07-16T15:17:48Z"}
I believe I got into this issue because I terminated the ec2 instance from the AWS console before drone-autoscaler tried to clean them up.
drone server ls shows the new server and not the old one.
drone server ls
agent-cbneAY52
How can I remove the old agent so that drone starts expecting the agent to be the one that it launches as part of the build?
I’ve deleted drone and drone-autoscaler from my cluster along with the PVC/PV and reinstalled into the cluster with new PVC claim and PV. I’m still seeing the same error referencing the same expected agent name.
drone-autoscaler-5b9d8b98d6-blhnk drone-autoscaler {"error":"error during connect: Get \"https://10.176.4.123:2376/v1.24/containers/json?limit=0\": x509: certificate is valid for agent-P428IeKe, not agent-84Uf8uAK","ip":"10.176.4.123","level":"debug","msg":"cannot connect, retry in 1m0s","name":"agent-84Uf8uAK","time":"2021-07-19T15:43:28Z"}
I figured this one out. I had used the userdata as an example from an already running agent and did not think about the docker cert stuff. Those certs are generated for the new agent coming online and I was using the hardcoded values for the original agent. I then found the example in the docs and things are working as expected now.