Kubernetes runner - kubernetes pipelines being assigned to runners of the wrong architecture

I have a k3s cluster with two nodes: one linux amd64, one linux arm64.

Some of my git repos have multiple pipelines in the .drone.yml - one for each architecture (amd64, arm64). Example of how they start off:

kind: pipeline
type: kubernetes
name: linux-amd64

platform:
  os: linux
  arch: amd64

# steps etc here

---

kind: pipeline
type: kubernetes
name: linux-arm64

platform:
  os: linux
  arch: arm64

# steps etc here

I have one kubernetes runner assigned to the amd64 node, and another kubernetes runner assigned to the arm64 node.

The problem I am seeing is that the runners receive the stage intended for the other architecture, e.g. in the debug logs for the arm64 runner, I will see that it received stage.name=linux-amd64, and the amd64 runner receives the arm64 stage.

I verified that the two runners are running on the correct nodes.

Is there something I might be doing wrong?

This is expected, as Kubernetes runners process all operating systems and architectures, and launch pods using the kubernetes.io/arch label [1]. It is therefore up to Kubernetes to ensure Pods are created on the correct architecture, using the the well known arch label that the runner sets.

[1] https://kubernetes.io/docs/reference/labels-annotations-taints/#kubernetes-io-arch

Thanks, Brad.

I checked one of the pods that the Kubernetes runner created for a build, these were the labels listed:

$ kubectl -n drone describe pods/drone-vf8pl5qwtqw4ct6ds7kc
Name:         drone-vf8pl5qwtqw4ct6ds7kc
Namespace:    drone
Priority:     0
Node:         arm64-node-name/1.2.3.4
Start Time:   Fri, 28 Jan 2022 23:52:35 +0000
Labels:       io.drone=true
              io.drone.build.event=push
              io.drone.build.number=13
              io.drone.name=drone-vf8pl5qwtqw4ct6ds7kc
              io.drone.repo.name=myreponame
              io.drone.repo.namespace=mynamespace

(Edit 3: After reading the post I found and mentioned in edit 2, the kubernetes.io/arch label must not have been applied because it was a pod for the amd64 pipeline)

I added this section to my pipelines which applied the label to pods that the Kubernetes runner created, but the pods were not being assigned to the correct node based on this (maybe there is some Kubernetes feature that does this that I’m not aware of):

metadata:
  labels:
    kubernetes.io/arch: <amd64/arm64>

Removing this section and defining a node_selector section did the trick, since Kubernetes nodes in the more recent Kubernetes versions have the kubernetes.io/arch label applied to them automatically:

node_selector:
  kubernetes.io/arch: <amd64/arm64>

This isn’t really something I wanted to have to add to all of my pipelines, though, so I looked into Kubernetes runner policies to see if I could have the node_selector section with the arch label applied by default. node_selector can be set in the policy, but it seems the only match option is to match by repo, which wouldn’t work for this.

I feel there should be a better solution for this but I can’t think of anything.

Edit: I slept on it, maybe setting a node_selector section like the one above as the default for a Kubernetes pipeline, with the arch taken from the pipeline’s platform section might work.

Edit 2: I started thinking that the platform section for Kubernetes runners did not actually do anything… I then found this post where this was mentioned. So, should setting the platform section in a Kubernetes pipeline actually be implicitly setting nodeSelector for kubernetes.io/arch?

You can find the relevant code where the runner sets the architecture here:

Drone is responsible for adding the architecture label (above) however, you may still need to configure your cluster and / or nodes to ensure workloads are routed appropriately.

I am also running into issues with how the runner-kube platform logic is currently configured.

I have opened the following PR which should address it: