Autoscaler allocation config propagation from the build

for the Prometheus benchmarking we need to be able to run jobs on dedicated machines that match a specific parameters.

Currently the pipeline allows specifying agent filters , but the autoscaler doesn’t allow provisioning of machines with different configs.

My idea is that if an agent with given filters doesn’t exist the autoscaler will provision a new one to match these requirement.

I will start looking where this fits best in the code and will open a PR , but some initial notes and opinions would be also useful.

this is simplified version of the pipeline.
agent is filtered by the tag of the machine and AGENT_MAX_PROCS. so I want to run this pipeline only on agent that is tagged machine1 and AGENT_MAX_PROCS=1 and if such agent doesn’t exist the autoscaler would provision a new one.

pipeline
    benchK8S1:
           group: bench
           image: k8s+fake_webserver
           agent:
               tags:
                    -machine1
                AGENT_MAX_PROCS: 1

since each provider has different names for the instances would need to use some aliased in the loaded config

machine1:
     -t2.medium // aws
     -m2.xlarge.x86 // packet.net

We are planning on deprecating filters and replacing with something called named queues [1]. It is not a trivial change, but I am happy to discuss further.

[1] Roadmap for the 1.0 release (see agent routing section)

yep , happy to discuss as well and give more details about our use case.

I am also interested to hear your opinion on how would the autoscaler fit/use the agent routing/tagging.

The plan for routing builds to specific agents will depend on first creating a new (breaking) feature that I am calling triggers. When you enable a repository, you will have to also setup one or more triggers. When a hook is received by the system (e.g. from github), a build will be executed for each matching trigger.

I briefly discuss the target use cases for triggers in the roadmap post and have done some concepting of what the screens might look like (below).

Right now every build is placed on the same default queue. Once triggers are in place, we would like to introduce the concept of names queues. When you create a trigger you can use the default queue, or you can specify a named queue.

Agents would be configured to pull jobs from specific queues (named, default, etc). This would enable routing builds to specific servers. In some cases you may want to route builds to different servers based on event type (push, pull request, etc) or based on some other criteria. You could do this by having multiple triggers, each with different matching logic.

Right now triggers are conceptual. The next step is to work on the technical design and create a more formal proposal for how this feature will work so that we can gather community input.

Have you thought about the autoscaler in this direction?

for example if we want to provision workers with settings which match the requirements of a given group (m2.xlarge.x86) and tear these down after the group is complete.

An example for a bench job. Run k8s on one machine that is 16gb of Ram , On another run prometheus that is 8gb , run some benchmarking and tear down both machines when over.

Once triggers are implemented, I would advocate for breaking the autoscaler development into two different stages. In the first stage, we could make a simple change to the autoscaler to filter builds by queue name and architecture. For example, you would pass the following variables to the autoscaler:

DRONE_QUEUE=m2.xlarge.x86
DRONE_ARCH=linux/arm

This means when the first stage is complete, you would have to run multiple instances of the autoscaler daemon, one for each named queue and architecture combination. While slightly less convenient than running a single instances, it allows us to support multiple machine types and architectures with minimal change to the codebase. It is an important first step because it gives us a chance to put all the necessary filtering in place.

For the second stage of development, I would create for a new drone-autoscaler-multi binary that accepts configuration for multiple architectures/queues/servers/providers. For example, you could do something like this:

drone-autoscaler-multi \
  --config=/etc/digitalocean/amd64.json \
  --config=/etc/packet/aarch64.json \
  --config=/etc/aws/amd64_highmem.json

The drone-autoscaler-multi program would loop through each configuration file and create and start a new engine for each:

db, _ := store.Connect(..., ....)
servers := store.NewServerStore(db)
client := setupClient(conf)

for _, config := range config {
  provider, _ := setupProvider(config)
  eng := engine.New(
		client,
		config,
		servers,
		provider,
  )

  g.Go(func() error {
    eng.Start(ctx)
    return nil
  })
}

This represents the high-level design that I have in my head. There are additional design details that we will need to address, however, I think this strikes a good balance between solving the target use case while minimizing development effort and disruption to the codebase (e.g. we are not re-writing or overhauling anything).

Thanks! that looks doable with minimal effort.