Drone on K8S - FailedScheduling

Hi,

I’m setting up Drone for the first time in K8S.

This is the deployment I’m using:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: drone-server
  namespace: sre
  labels:
    app: drone
    component: server
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: drone
        component: server
    spec:
      serviceAccountName: drone-service-account
      containers:
      - name: server
        image: "docker.io/drone/drone:1.0.0-rc.5"
        imagePullPolicy: IfNotPresent
        env:
          - name: DRONE_KUBERNETES_ENABLED
            value: "true"
          - name: DRONE_KUBERNETES_NAMESPACE
            value: "sre"
          - name: DRONE_KUBERNETES_SERVICE_ACCOUNT
            value: "drone-pipeline-service-account"
          - name: DRONE_ALWAYS_AUTH
            value: "false"
          - name: DRONE_SERVER_HOST
            value: "<>"
          - name: DRONE_SERVER_PROTO
            value: "https"
          - name: DRONE_RPC_HOST
            value: "drone.sre.svc.cluster.local"
          - name: DRONE_RPC_PROTO
            value: "http"
          - name: DRONE_USER_CREATE
            value: username:roccodonnarummaef,machine:false,admin:true
          - name: DRONE_RPC_SECRET
            value: "<>"
          - name: DRONE_GITHUB_CLIENT_ID
            value: "<>"
          - name: DRONE_GITHUB_SERVER
            value: "https://github.com"
          - name: DRONE_GITHUB_CLIENT_SECRET
            value: "<>"
          - name: DRONE_LOGS_TRACE
            value: "true"
          # - name: DRONE_DEBUG_DUMP_HOOK
          #   value: "true"
        ports:
        - name: http
          containerPort: 80
          protocol: TCP
        - name: https
          containerPort: 443
          protocol: TCP
        - name: grpc
          containerPort: 9000
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        volumeMounts:
          - name: data
            mountPath: /var/lib/drone
      volumes:
      - name: data
        emptyDir: {}

Pod starts fine and pipeline job is triggered:

kind: pipeline
name: build-docker

steps:
- name: build
  image: docker:dind
  volumes:
  - name: dockersock
    path: /var/run/docker.sock
  commands:
  - docker ps -a

volumes:
- name: dockersock
  host:
    path: /var/run/docker.sock

The pod created by the pipeline fails to be scheduled. I’ve tried setting a node selector on the deployment but got the same result.

Any ideas on what am I missing?

the error says “0/7 nodes are available: 7 Nodes didn’t amtch the node selector”. Drone does not set node affinity by default, so my first thought would be that your yaml defines node affinity that cannot be fulfilled?

This is the complete deployment.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: sre
---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: drone-server
  namespace: sre
  labels:
    app: drone
    component: server
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: drone
        component: server
    spec:
      serviceAccountName: drone-service-account
      nodeSelector:
        role: worker
      containers:
      - name: server
        image: "docker.io/drone/drone:1.0.0-rc.5"
        imagePullPolicy: IfNotPresent
        env:
          - name: DRONE_KUBERNETES_ENABLED
            value: "true"
          - name: DRONE_KUBERNETES_NAMESPACE
            value: "sre"
          - name: DRONE_KUBERNETES_SERVICE_ACCOUNT
            value: "drone-pipeline-service-account"
          - name: DRONE_ALWAYS_AUTH
            value: "false"
          - name: DRONE_SERVER_HOST
            value: "<example.com>"
          - name: DRONE_SERVER_PROTO
            value: "https"
          - name: DRONE_RPC_HOST
            value: "drone.sre.svc.cluster.local"
          - name: DRONE_RPC_PROTO
            value: "http"
          - name: DRONE_USER_CREATE
            value: username:roccodonnarummaef,machine:false,admin:true
          - name: DRONE_RPC_SECRET
            value: "<>"
          - name: DRONE_GITHUB_CLIENT_ID
            value: "<>"
          - name: DRONE_GITHUB_SERVER
            value: "https://github.com"
          - name: DRONE_GITHUB_CLIENT_SECRET
            value: "<>"
          - name: DRONE_LOGS_TRACE
            value: "true"
          # - name: DRONE_DEBUG_DUMP_HOOK
          #   value: "true"
        ports:
        - name: http
          containerPort: 80
          protocol: TCP
        - name: https
          containerPort: 443
          protocol: TCP
        - name: grpc
          containerPort: 9000
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /
            port: http
        volumeMounts:
          - name: data
            mountPath: /var/lib/drone
      volumes:
      - name: data
        emptyDir: {}
---

apiVersion: v1
kind: Service
metadata:
  name: drone
  namespace: sre
  labels:
    app: drone
spec:
  type: ClusterIP
  ports:
  - name: http
    port: 80
    targetPort: 80
  - name: grpc
    port: 9000
    targetPort: 9000
  selector:
    app: drone
    component: server
---

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: drone-public-rules
  namespace: sre
  annotations:
    kubernetes.io/ingress.class: traefik-public
spec:
  rules:
  - host: <example.com>
    http:
      paths:
      - path: /
        backend:
          serviceName: drone
          servicePort: 80
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: drone-service-account
  namespace: sre
  labels:
    app: drone
---

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: drone
  namespace: sre
  labels:
    app: drone
rules:
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - "*"
  - apiGroups:
      - extensions
    resources:
      - deployments
    verbs:
      - get
      - list
      - patch
      - update
---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: drone
  namespace: sre
  labels:
    app: drone
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: drone
subjects:
- kind: ServiceAccount
  name: drone-service-account
  namespace: sre
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: drone-pipeline-service-account
  namespace: sre
  labels:
    app: drone
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: drone-pipeline
  namespace: sre
  labels:
    app: drone
rules:
  - apiGroups:
      - extensions
    resources:
      - deployments
    verbs:
      - get
      - list
      - watch
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - namespaces
      - configmaps
      - secrets
      - pods
    verbs:
      - create
      - delete
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - pods/log
    verbs:
      - get
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: drone-pipeline
  namespace: sre
  labels:
    app: drone
subjects:
  - kind: ServiceAccount
    name: drone-pipeline-service-account
    namespace: sre
roleRef:
  kind: ClusterRole
  name: drone-pipeline
  apiGroup: rbac.authorization.k8s.io
---

oh perhaps I am misunderstanding … I assumed you were using Drone for Kubernetes, have a running instance, and you cannot get Drone to execute a build on kubernetes. Is the issue that you are having trouble getting Drone server deployed?

To clarify, I was previously asking about your .drone.yml configuration.

Sorry, my bad - let me explain better.

I’ve deployed drone server in K8S using the above yaml (deployment.yaml). I can access the UI, sync with GitHub, activate a repo and upon pushing a commit a job is created in K8S. That job creates a pod which is then pending to be executed.

I can provide logs or more info if needed. Below is the pending pod description:

root@af4ec1e7e484:/work/kube-apps# kubectl -n hnh5f4t4fl16znfsxz016fu0op9zv2vd describe pod wun0i9trl1wyd9x2q9kfbyq8iusuihl7
Name:               wun0i9trl1wyd9x2q9kfbyq8iusuihl7
Namespace:          hnh5f4t4fl16znfsxz016fu0op9zv2vd
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             io.drone=true
                    io.drone.build.number=1
                    io.drone.created=1550855521
                    io.drone.expires=1550862721
                    io.drone.protected=false
                    io.drone.repo.name=docker-image-helloworld
                    io.drone.repo.namespace=example
                    io.drone.stage.name=build-docker
                    io.drone.stage.number=1
                    io.drone.step.name=clone
                    io.drone.ttl=1h0m0s
Annotations:        <none>
Status:             Pending
IP:
Containers:
  wun0i9trl1wyd9x2q9kfbyq8iusuihl7:
    Image:      docker.io/drone/git:latest
    Port:       <none>
    Host Port:  <none>
    Environment:
      DRONE_NETRC_PASSWORD:        x-oauth-basic
      DRONE_COMMIT_REF:            refs/heads/drone
      DRONE:                       true
      DRONE_STEP_NUMBER:           1
      DRONE_REPO_PRIVATE:          true
      CI_JOB_STATUS:               success
      DRONE_TARGET_BRANCH:         drone
      DRONE_DEPLOY_TO:
      CI_NETRC_PASSWORD:           x-oauth-basic
      CI_BUILD_STARTED:            1550855521
      CI_BUILD_FINISHED:           1550855521
      DRONE_WORKSPACE_PATH:
      DRONE_SYSTEM_VERSION:        7a510d79c88f93ee6a0d24bd48cf8797941a3509
      DRONE_BUILD_NUMBER:          1
      DRONE_REPO_VISIBILITY:       private
      DRONE_COMMIT_BRANCH:         drone
      CI_JOB_STARTED:              1550855521
      CI_WORKSPACE_PATH:
      DRONE_GIT_HTTP_URL:          https://github.com/example/docker-image-helloworld.git
      DRONE_RUNNER_HOSTNAME:       ip-10-0-3-52.eu-west-1.compute.internal
      DRONE_BUILD_CREATED:         1550855518
      DRONE_REPO_NAMESPACE:        example
      DRONE_STEP_NAME:             clone
      DRONE_SYSTEM_HOSTNAME:       drone.example.com
      DRONE_COMMIT_AFTER:          bd9618f512c528bc6491368529c309589fe45a85
      DRONE_NETRC_USERNAME:        b94102da57f552387c2613d8d686329232c53752
      CI_BUILD_STATUS:             success
      DRONE_BUILD_STARTED:         1550855521
      DRONE_COMMIT_AUTHOR:         roccodonnarummaef
      DRONE_RUNNER_PLATFORM:       linux/amd64
      DRONE_WORKSPACE_BASE:        /drone/src
      DRONE_COMMIT_AUTHOR_EMAIL:   rocco.donnarumma@example
      DRONE_REPO:                  example/docker-image-helloworld
      DRONE_REPO_NAME:             docker-image-helloworld
      DRONE_COMMIT_AUTHOR_AVATAR:  https://avatars1.githubusercontent.com/u/44259300?v=4
      DRONE_BRANCH:                drone
      DRONE_RUNNER_HOST:           ip-10-0-3-52.eu-west-1.compute.internal
      DRONE_SYSTEM_PROTO:          https
      DRONE_JOB_STATUS:            success
      DRONE_JOB_FINISHED:          1550855521
      DRONE_WORKSPACE:             /drone/src
      DRONE_REPO_OWNER:            example
      DRONE_COMMIT_AUTHOR_NAME:    Rocco Donnarumma
      CI_NETRC_MACHINE:            github.com
      DRONE_NETRC_MACHINE:         github.com
      DRONE_SOURCE_BRANCH:         drone
      DRONE_BUILD_EVENT:           push
      DRONE_REPO_SCM:
      DRONE_MACHINE:               ip-10-0-3-52.eu-west-1.compute.internal
      DRONE_COMMIT_MESSAGE:        drone
      DRONE_COMMIT:                bd9618f512c528bc6491368529c309589fe45a85
      DRONE_BUILD_FINISHED:        1550855521
      DRONE_REMOTE_URL:            https://github.com/example/docker-image-helloworld.git
      DRONE_BUILD_STATUS:          success
      DRONE_JOB_STARTED:           1550855521
      CI_WORKSPACE_BASE:           /drone/src
      CI_WORKSPACE:                /drone/src
      DRONE_COMMIT_SHA:            bd9618f512c528bc6491368529c309589fe45a85
      DRONE_BUILD_LINK:            https://drone.example.com/example/docker-image-helloworld/1
      DRONE_GIT_SSH_URL:           git@github.com:example/docker-image-helloworld.git
      CI_JOB_FINISHED:             1550855521
      DRONE_COMMIT_LINK:           https://github.com/example/docker-image-helloworld/compare/8a6999ab1916...bd9618f512c5
      DRONE_BUILD_ACTION:
      DRONE_COMMIT_BEFORE:         8a6999ab19161144577671e879bfa1c062b51609
      CI:                          true
      DRONE_SYSTEM_HOST:           drone.example.com
      CI_NETRC_USERNAME:           b94102da57f552387c2613d8d686329232c53752
      DRONE_REPO_LINK:             https://github.com/example/docker-image-helloworld
      DRONE_REPO_BRANCH:           master
      KUBERNETES_NODE:              (v1:spec.nodeName)
    Mounts:
      /drone/src from ogbpsy38qodi1pktv954j2edhypds5ll (rw)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  ogbpsy38qodi1pktv954j2edhypds5ll:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/drone/hnh5f4t4fl16znfsxz016fu0op9zv2vd/ogbpsy38qodi1pktv954j2edhypds5ll
    HostPathType:  DirectoryOrCreate
QoS Class:         BestEffort
Node-Selectors:    <none>
Tolerations:       node.kubernetes.io/not-ready:NoExecute for 300s
                   node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  FailedScheduling  2m57s (x25 over 3m31s)  default-scheduler  0/7 nodes are available: 7 node(s) didn't match node selector.

root@af4ec1e7e484:/work/kube-apps#

providing a sample of your .drone.yml is also helpful.

drone.yml

kind: pipeline
name: build-docker

steps:
- name: build
  image: docker:dind
  volumes:
  - name: dockersock
    path: /var/run/docker.sock
  commands:
  - docker ps -a

volumes:
- name: dockersock
  host:
    path: /var/run/docker.sock

Although I’ve tried with a simpler step too, something like:

steps:
- name: build
  image: golang
  commands:
  - go build
  - go test

I think this is the crux of the issue. Can you see why it is not being scheduled? There must be some information provided by Kubernetes to better understand what something was not scheduled.

Drone uses node affinity to schedule your pipeline steps on the same machine as the pipeline controller (e.g. drone-job-…). The pipeline controller is configured to receive the host node information via spec.nodeName as described in the kubernetes docs [1]. This is used to schedule all pipeline steps on the same node as the controller. Pipeline steps are created with node affinity that matches the spec.nodeName to kubernetes.io/hostname [2]. That is one thing I did not see in the pod spec that you posted, where node selectors looked empty …

Unfortunately I do not use Kubernetes and have limited working knowledge, which means the best I can do is point you to the source code to get a better understanding of how things work. I have personally tested Drone on Kubernetes with Digital Ocean and Minikube and have not run into any issues, however, I recognize that no two clusters are the same.

[1] https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/#use-pod-fields-as-values-for-environment-variables
[2] https://github.com/drone/drone-runtime/blob/master/engine/kube/kube.go#L136

I added some more detailed notes to my previous reply.
You might also find this post helpful for debugging at a lower level. Contributing to Drone for Kubernetes

I ran into this same problem and did a little debugging. With the way my cluster is set up (kops on AWS), the nodeName does not correspond to the hostname. You can verify that you’re encountering the problem by looking at the spec for the pod that generated by the runner/job and looking at the node affinity requirements that are generated.

I think using the kubernetes.io/hostname label in the pod affinity might not be entirely correct since that assumes the nodeName and hostname are always identical.

Instead, why don’t we consider using https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodename since that does select nodes by name and will accomplish the same thing that you’re trying to do with the pod affinity/hostname.

Alternatively, we could change the fieldRef from passing the nodeName to instead passing the kubernetes.io/hostname label

Thoughts?

2 Likes

Sure, please consider sending a pull request. I am not actively working on the Kubernetes implementation right now – I have others priorities I need to address – but am happy to accept pull requests.

Hi. I had the same issue it could not schedule the pod because kubernetes.io/hostname label was not same to the node name. So I made a PR which is using the node name. Thanks