Drone reaper failing to clean up pods

Our drone server is running on kubernetes, with kubernetes pipelines. We are running into an issue where we have configured the reaper to clean up pipelines, but our settings are not being respected and pods as well as the associated run in the UI will remain running for days.

Server configuration:

          - name: DRONE_CLEANUP_DEADLINE_RUNNING
            value: 1h
          - name: DRONE_CLEANUP_DEADLINE_PENDING
            value: 30m
          - name: DRONE_CLEANUP_INTERVAL
            value: 30m

Example:

drone-6xm8a2ldmtdh1p2uz3w2                                    5/7     NotReady                     2          4d23h
drone-97a7q7ttptfrbbfngug9                                    5/7     Error                        2          6d20h
drone-ef2k1bjurrk3z95o901b                                    5/7     Error                        2          6d18h
drone-n3gvzhtbid73zvjcn6ml                                    5/7     Error                        2          10d

Drone reaper failing to clean up pods

The reaper is a server-side thread that detects “zombie” builds that are listed as pending or running. The most common root cause for a build being stuck in a “zombie” state is when someone kills a runner while pipelines are running, which prevents the runner from sending status updates to the server – this puts the build in a zombie state where the entry in the database has an indefinite pending or running status.

The reaper is therefore responsible for finding zombie builds and updating the database status from “pending” or “running” to “killed”. The reaper is not responsible for any cleanup. The runner is responsible for cleaning up pods when the pipeline exits.

Keep in mind that runners must be gracefully terminated. If a runner is not gracefully terminated, and is terminated while pipelines are running, it will not be able to perform cleanup.

Thanks for the insight onto how this works. Appreciate it.