Hey folks,
I’ve begun testing out using the new 1.0 Drone Kubernetes features, and there’s a lot to like. Right however I’m having difficulty reproducing a pattern we had in 0.8 for running our test suite. To avoid the XY problem, here’s what I’m trying to achieve, and then I’ll describe approaches I’ve tried.
Ultimately what we want is to run our test suite. Doing this requires a fair bit of dependency compilation, which I don’t want to do every time.
In 0.8 we solved this by defining a dockerfile that would compile our dependencies. This dockerfile was built by plugins/docker, and we mounted the system docker socket and docker lib in so that the images would persist as some foo-tests
docker image. Future step invocations then would have the image layers in cache.
Then we had:
image: foo-tests
commands:
- test
This worked because the agent would always run this on the same node that did the previous build-test step. In Kubernetes however this doesn’t seem to be working, the step hangs and I see an image pull error if I do kubectl get po --all-namespaces
.
My planned workaround was to just replace all of this with a dind
step that did the building and running explicitly:
- name: test
image: docker:stable-dind
privileged: true
commands:
- >
docker build
-t myapp-test:${DRONE_COMMIT_SHA}
-f production/Dockerfile.test .
- >
docker run
--network=bridge
-e DATABASE_HOST=database
myapp-test:${DRONE_COMMIT_SHA} mix test
volumes:
- name: docker_run
path: /var/run/
- name: docker_lib
path: /var/lib/docker
Problem with THAT is I can’t get services to be visible from the spawned container. We have a database service that’s required for the test and I keep getting nxdomain from within the spawned container. I’ve tried every combination of --network=host
I can think of. This is why we originally did the two step “build the test image then run the test image” approach: connecting to the service just magically worked.
What I am I missing here?
Thanks!
Unfortunately this is the drawback of the native Kubernetes runtime. At this time there is no good way to build a Docker image and then test the Docker image, and trying to workaround this can be difficult because of Kubernetes networking. For this reason, I still recommend using the standard Docker runtime (not Kubernetes).
Thanks @bradrydzewski. For our specific use case we only need the service for a single step, so I’ll probably try just starting that image explicitly via DIND and wiring up the network myself. Does that seem plausible?
We have an example in our docs. I have verified that this works with our Docker runtime, but I have not tested this with the Kubernetes runtime. https://docs.drone.io/examples/service/docker_dind/
I suppose what I had in mind was something along the lines of:
- name: test
image: docker:stable-dind
privileged: true
commands:
- >
docker build
-t myapp-test:${DRONE_COMMIT_SHA}
-f production/Dockerfile.test .
- docker network create myapp
- docker run -d --network=myapp postgres
- >
docker run
--network=myapp
-e DATABASE_HOST=database
myapp-test:${DRONE_COMMIT_SHA} mix test
volumes:
- name: docker_run
host:
path: /var/run/
- name: docker_lib
host:
path: /var/lib/docker
I’m not entirely sure what the difference between doing this and the dind as a service is. Most of our steps (other than this one) are doing docker image builds so we tend to mount /var/run and /var/lib/docker into most steps to maximize image caching.
It works!
- name: test
image: docker:stable-dind
privileged: true
commands:
- >
docker build
-t myapp-test:${DRONE_COMMIT_SHA}
-f production/Dockerfile.test .
- docker network create myapp || true
- docker run -d --network=myapp --name=postgres-${DRONE_COMMIT_SHA} -p 5432 postgres
- >
docker run
-e DATABASE_HOST=postgres-${DRONE_COMMIT_SHA}
--network=myapp
myapp-test:${DRONE_COMMIT_SHA} mix test
volumes:
- name: docker_run
path: /var/run/
- name: docker_lib
path: /var/lib/docker
The whole ${DRONE_COMMIT_SHA}
bit is because the container names conflict otherwise. I should probably be doing some cleanup as well since this generates dead containers pretty quickly.
My declaration may have been premature. This works sometimes. Most of the time it actually errors with
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Even when restarting a job that previously succeeded. What is the canonical way to have access to the node level docker socket and storage path when using the kubernetes runtime? Access to both is, as far as I can tell, the only way to get reliable image caching.
I suppose I could push all images (including ones built for tests) up to a repository, and then re-pull them every time. As long as there can be a shared volume between steps that would work fine.
Fundamentally this gets at: How are steps run on K8s? I see a job created for the actual build job, and I also see other pods that seem to be created for each step. Are those pods scheduled with the normal K8s scheduler? Are they on possibly different nodes or are they on the same node? How does that interact with volumes? Services?
EDIT: Found the answer to some of these. All steps do run on the same node per a comment here: https://github.com/drone/drone-runtime/issues/19
This makes it all the more confusing why one step can work fine with dind but then a step after using plugin/docker
can’t find the socket.
it is difficult for us to triage such questions without seeing an example that can be used to reproduce.
Understood sorry. I’ll try to get a self contained example going. I first wanted to make sure I wasn’t missing obvious docs on the right way to do this.