I’ve got a build that’s hitting the 1 hour time limit on the open source cloud instance but I only have guesses as to why it’s running slowly:
It’s actually that big (testing on another VM to work this out)
Running out of RAM and swapping
IO issues because I’m running with many threads (will try new builds later with less threads)
How would I go about troubleshooting this? The process is just killed at the 1 hour limit with no info. AppVeyor allows you to remote into the running job, but I understand that’s a pretty special case.
Drone automatically kills any build that exceeds a 1 hour time limit. In terms of hardware, there should be plenty of resources available on our machines, which are bare-metal, 64GB ram and 24 Physical Cores @ 2.2 GHz (AMD EPYC 7401P). Full specs here.
How would I go about troubleshooting this?
With your permission I can re-run the build and then monitor resource usage on the host machine and let you know if I see anything interesting. I would likely take a look at this some time early this work week (it Sunday night where I am).
AppVeyor allows you to remote into the running job, but I understand that’s a pretty special case.
Historically Drone was designed to run on-premise in which case you typically have direct access to the host machine instance. The cloud offering is new, and the tools for remote debugging are still immature. Hopefully this improves over time.
I found it - a command was triggering a full rebuild halfway through the build…
That’s an unexpected mismatch - drone.io is work for you, but I only get a chance to use it for side projects in my spare time!
I’m pretty new to Linux performance monitoring (htop is basically my limit) but I’ll be watching this with interest.
Thanks again for your detailed response. FWIW, you have my permission to use my builds for any testing in the future as I’m doing open source stuff and probably on the more resource intensive side of things (qemu, buildroot and so on)