So I’ve started running drone-gc together with my drone agent containers (1 agent/1 gc per machine) and it was working mostly fine for a couple of weeks. Until suddenly one of them stopped pruning images.
It had been running for a fewdays already, when on April 27th it had an error deleting a Node image (it had several similar errors in the past):
"error": "Error response from daemon: conflict: unable to delete 406f227b21f5 (cannot be forced) - image has dependent child images",
"message": "cannot remove image"
After that, it kept updating the image cache (imaged used, updated cache), but never again logged a prune operation. I noticed it only a week later (on May 1st) when my instance ran out of disk
Any ideas what could cause this? Anyone had the same issue? I’ll take a look at the code later to see if I can find anything.
all of our functions use context which means we could definitely integrate a timeout for each cycle of garbage collection. We would probably want to set a reasonably high timeout since the primary goal would be to avoid infinite blocking.