Hi Marko,
We just ran another build but this time removed the delete namepace/pod permissions from the service account cluster role so the pod is not cleaned up.
time="2021-09-01T15:14:31Z" level=debug msg="Launched containers. Duration=0.03s" count=1 failed=0 success=1
time="2021-09-01T15:14:31Z" level=debug msg="destroying the pipeline environment" build.id=290 build.number=25 repo.id=1801 repo.name=test-2 repo.namespace=willem.veerman stage.id=290 stage.name=default stage.number=1 thread=95
As can be seen here, the logs do not show the failure even though the UI does.
It moves from Launch containers step straight to the pipeline cleanup in the next step.
The job failure is instant, and these are shown in k8s audit logs:
500 (comes first):
{
"_index": "kubernetes-audit-acp-test-2021.09.01",
"_type": "doc",
"_id": "Nuz3oXsBqwUb303WCRKC",
"_score": 1,
"_source": {
"tags": [
"kubernetes_audit_filtered"
],
"@version": "1",
"index_prefix": "kubernetes-audit-acp-test",
"audit_json": {
"stage": "ResponseComplete",
"userAgent": "drone-runner-kube/v0.0.0 (linux/amd64) kubernetes/$Format",
"stageTimestamp": "2021-09-01T15:25:25.830153Z",
"kind": "Event",
"requestReceivedTimestamp": "2021-09-01T15:25:25.820873Z",
"responseStatus": {
"status": "Failure",
"code": 500,
"reason": "ServerTimeout",
"metadata": {}
},
"auditID": "7159b97d-5bff-4823-80b4-ad579d5bd35d",
"user": {
"uid": "399f3b36-7df6-47f6-a35a-eb4f20792ec8",
"username": "system:serviceaccount:drone-ci:drone-gl-runner",
"groups": [
"system:serviceaccounts",
"system:serviceaccounts:drone-ci",
"system:authenticated"
]
},
"apiVersion": "audit.k8s.io/v1",
"verb": "create",
"sourceIPs": [
"10.250.7.193"
],
"annotations": {
"authorization.k8s.io/decision": "allow",
"authentication.k8s.io/legacy-token": "system:serviceaccount:drone-ci:drone-gl-runner",
"authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"drone-gl-runner\" of ClusterRole \"drone-gl-runner\" to ServiceAccount \"drone-gl-runner/drone-ci\""
},
"level": "Metadata",
"objectRef": {
"resource": "pods",
"apiVersion": "v1",
"namespace": "drone-bleqfkaay7k3rhbqwz8i",
"name": "drone-zx079dmtb3jo13rl4vwz"
},
"requestURI": "/api/v1/namespaces/drone-bleqfkaay7k3rhbqwz8i/pods"
},
"s3Key": "drone-bleqfkaay7k3rhbqwz8i",
"@timestamp": "2021-09-01T15:25:25.830Z"
},
"fields": {
"audit_json.stageTimestamp": [
"2021-09-01T15:25:25.830Z"
],
"audit_json.requestReceivedTimestamp": [
"2021-09-01T15:25:25.820Z"
],
"@timestamp": [
"2021-09-01T15:25:25.830Z"
]
}
}
422 afterwards:
{
"_index": "kubernetes-audit-acp-test-2021.09.01",
"_type": "doc",
"_id": "uuz3oXsBqwUb303WEBKh",
"_score": 1,
"_source": {
"audit_json": {
"userAgent": "drone-runner-kube/v0.0.0 (linux/amd64) kubernetes/$Format",
"responseStatus": {
"metadata": {},
"code": 422,
"reason": "Invalid",
"status": "Failure"
},
"apiVersion": "audit.k8s.io/v1",
"level": "Metadata",
"verb": "update",
"objectRef": {
"uid": "6defb1b6-b2d2-4d39-8a0e-786958e36a69",
"apiVersion": "v1",
"resourceVersion": "513423363",
"name": "drone-zx079dmtb3jo13rl4vwz",
"resource": "pods",
"namespace": "drone-bleqfkaay7k3rhbqwz8i"
},
"user": {
"username": "system:serviceaccount:drone-ci:drone-gl-runner",
"uid": "399f3b36-7df6-47f6-a35a-eb4f20792ec8",
"groups": [
"system:serviceaccounts",
"system:serviceaccounts:drone-ci",
"system:authenticated"
]
},
"stageTimestamp": "2021-09-01T15:25:27.334165Z",
"requestURI": "/api/v1/namespaces/drone-bleqfkaay7k3rhbqwz8i/pods/drone-zx079dmtb3jo13rl4vwz",
"annotations": {
"authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"drone-gl-runner\" of ClusterRole \"drone-gl-runner\" to ServiceAccount \"drone-gl-runner/drone-ci\"",
"authorization.k8s.io/decision": "allow",
"authentication.k8s.io/legacy-token": "system:serviceaccount:drone-ci:drone-gl-runner"
},
"requestReceivedTimestamp": "2021-09-01T15:25:27.312148Z",
"stage": "ResponseComplete",
"auditID": "7153b7b2-a74a-43b9-90f3-2f89d713ca3b",
"kind": "Event",
"sourceIPs": [
"10.250.7.193"
]
},
"index_prefix": "kubernetes-audit-acp-test",
"@timestamp": "2021-09-01T15:25:27.334Z",
"s3Key": "drone-bleqfkaay7k3rhbqwz8i",
"tags": [
"kubernetes_audit_filtered"
],
"@version": "1"
},
"fields": {
"audit_json.stageTimestamp": [
"2021-09-01T15:25:27.334Z"
],
"audit_json.requestReceivedTimestamp": [
"2021-09-01T15:25:27.312Z"
],
"@timestamp": [
"2021-09-01T15:25:27.334Z"
]
}
}
These two logs from the drone server relate to the job failure:
{"error":"stream: not found","level":"warning","msg":"manager: cannot teardown log stream","step.id":4908,"step.name":"step_2","step.status":"skipped","time":"2021-09-01T14:28:13Z"}
{"build.id":288,"build.number":157,"error":"Cannot transition status via :enqueue from :pending (Reason(s): Status cannot transition via \"enqueue\")","level":"warning","msg":"manager: cannot publish status","repo.id":1802,"stage.id":288,"time":"2021-09-01T14:57:34Z"}