i’m trying to come up with a good way to set up cpu/memory options for our kubernetes-based drone setup.
Here’s the problem:
All runner-pods run on dedicated (via tains/toleratinos) runner nodes, and we want these to scale up and down on demand. In order to do that, the pod needs some settings informing kubernetes on the amount of memory and cpu it is likely to consume. Here’s the issue: Unlike a lot of other (non-k8s) build systems, drone ci creates a single pod containing all containers (steps) the build pipeline needs. In Kuberenets, cpu/memory is a per-container thing, not a pod-thing.
This means that if I set “1 cpu, 1GB” for a pipeline, every single container in the pipeline get those settings. If the pipeline has 10 steps, that means 10 cores and 10 gigs of ram. If there happens to be an available node with capable of running that, fine - but if not the build pod will just be pending indefinetely.
On the other side, if I don’t set any cpu/memory limits, kubernetes’ cluster-autoscaler won’t know when it’s time to add more nodes to the autoscaling group, since it doesn’t have enough information about the requirements of the “upcoming” pod.
All in all, I would conclude because of the way drone ci uses runner pods, kubernete’s built-in scaling mechanisms are not adequate to build a robust autoscaling build environment for Drone CI. I would be interested in learning what other users are thinking about this.
Drone takes the same approach as Tekton where it runs every pipeline step as a container in the same Pod. This means you may need to define resources for each pipeline step, and if you have too many steps you may need to split you pipeline into multiple as shown here.
I do not see any reason that Drone (or Tekton) would have issues with autoscaling, however, like any system that builds on Kubernetes there are limitations and design tradeoffs, and you will have to design your pipelines with these in mind.
Also, even if everyone were to add per-container cpu memory/limits we’d still have a bit of “waste” because Kubernetes assumes that a pod’s sum-of cpu/memory settings are required for a node to be able to schedule the pod, while in fact most pipelines are serial so the real capacity needed at any time is simply the most “expensive” container.
If for example a pipeline consts of 4 cpu-intensive compile steps run serially, a developer might set “cpu.limit=1” on each of them. That pod wouldn’t be able to be scheduled on a node with 2 cpu cores, since 4 is required, even tho only 1 is actually needed at any given time.
As far as I can see, we’re forced to spin up much larger nodes than are actually needed to run the build job.
An alternative would be (I guess) to have a “blank” container as a pipeline step, and that container would be the only one with cpu requests/limits. This would mean that the pod cpu/memory settings would be equal to that single “blank” container, which would allow Kubernetes to more correctly schedule the pod - one would simply use the settings for the “most expensive” step in the pipeline and apply that.
It would be awesome if Drone could handle adding such a “blank” container to the pipeline behind the scenes based on some settings.
The Tekton project basically ended up doing the same thing, based on this comment and this comment. If I understand correctly Tekton basically takes the highest resource request across all steps and discards the rest, but does still allow for per-step resoure limits.
This creates a “fake service” with a resource lock on 3000 cpu (or 3 cpu cores). Our build runners run on 4-core instances, so this ensures that a single node will never run more than one build at any given time. It works well enough.
We decided to use a “service” container for this, just to separate it from regular “step containers”
Drone will be using stage level memory & cpu requests. Stage level memory & cpu requests provided in the pipeline will be used to ensure that kubernetes only reserves that much amount of memory & requests when it schedules the pod. Since kubernetes reserves sum of requests for all the containers for scheduling pod, stage level memory & cpu needs to be distributed amongst all the steps.
Tekton takes an approach where it sets stage level memory & cpu to one container while other containers get zero values for resource requests. There is one caveat with this approach. If limitrange is set for the namespace with min cpu/memory request of container greater than 0, then pod creation fails. To handle that scenario, tekton allows specifying limitrange name in the task yaml. At runtime, tekton queries the limitrange to fetch the min request values for cpu & memory. It uses these min values instead of zero value.
Proposed solution for using stage level requests
Drone instead of taking limitrange name will allow users to set environment variables
DRONE_RESOURCE_MIN_REQUEST_CPU for min cpu request
DRONE_RESOURCE_MIN_REQUEST_MEMORY for min memory request
If these environment are set, then these values are used. Otherwise zero value is used for min cpu/memory request. Rest of the approach remains same as tekton.
while in fact most pipelines are serial so the real capacity needed at any time is simply the most “expensive” container.
Don’t “service” containers sort of throw a wrench in this thought? Thinking about this, if I have a postgresql service running that I set a limit of 1GB of memory on and a step that has a limit of 1GB of memory. This new logic would assume the most a build pipeline needs is 1GB. But this is not true, since both will run at the same time the max would be 2GB, in this case - we would expect 1GB to be available to the service and 1GB to the step.
I know that Drone will not be taking this into consideration since it seems we are defining the resource requests value, but this should be a scenario that is thought about and probably documented as I don’t think it is an odd use-case.
@bradrydzewski Curious your thoughts on the above as I see a solution has been implemented in Drone latest.
We dynamically build our YAML. So, in our dynamic service we have taken this “math” into consideration and ensure that the resource requests value includes the most expensive container (non-service) in addition to the “service” containers.