We run drone agents on spot instances in AWS, and scale the number of spot instances based on the day and time of day.
We’ve noticed that if a build is executing on a node that gets scaled out, these builds stall, and have to be canceled manually. Sometimes those executing a build don’t notice this, because builds so often succeed - and they check back hours later, only to find that the build is still “executing”, but really stalled out.
Is there a way for Drone to cancel these stalled out builds automatically?
We do not recommend using Drone with spot instances because they are volatile and can be terminated unexpectedly while pipelines are running, leaving pipelines in a suspended state. With that being said, the latest version of Drone (1.8.1) will clear out pipelines that are in a suspended state automatically, every 24 hours, I believe.
Thanks @ashwilliams1. Is that period of time configurable?
the cleanup routine is run every 24 hours by default and can be configured:
the default timeout period for a stuck pending job is 24 hours and can be configured:
the default timeout period for a stuck running job is 24 hours and can be configured:
hope that helps!
Thanks @bradrydzewski, it certainly does!