I got some good answers on my question about High Availability and it sounds like this is not on the horizon in the near-future.
The next best thing to HA for me would be providing an expectation about recovery time if our drone instance totally kicked the bucket. I’ve already got chef scripts to automate the process of standing up a fresh drone instance. But I’m not clear if there’s a notion of backup & restore for Drone, either built-in, or something I could hand-roll.
Drone stores all state in the database. So you would setup regular database backups from which you could restore. You would need to consult your database vendor documentation for backup and restore options and tools:
If you are running drone on AWS or Google Cloud you can use their managed mysql instances which include automated backups and the ability to restore via the web interface.