First of, thanks for for the awesome product! Currently I’m trying to set up my selfhosted pipeline, but am running into minor issues with configuring layer caching based on a private repository. Based on the article provided by laslzlocph posted on his site and mentioned somewhere on the forum, I came up with the pipeline attached below.
My intention is to use layers from existing images to minimize the throughput time when working on features, especially since building wheels on a pi is very time-consuming. However, my build keeps recompiling and building everything at the first step, even though I have not changed anything in the requirements, code or buildfiles for the image.
My expectation was that, since both the steps and the resulting layers are the same, it would use those. Can anybody help me understand what is happening, and why it does not use the layers from my personal registry?
I suspect I might have misunderstood the concept of build cache layers and need to adjust either my dockerfile or drone.yml.
Edit:
It is possible to push and pull from the repo using CLI and the images build during the first step can be found there, so that part of the pipeline works as expected.
DOCKERFILE
FROM python:3.6-alpine3.9 as base
FROM base as builder
RUN mkdir /install
WORKDIR /install
COPY ./requirements.txt /requirements.txt
RUN apk update
RUN apk add --update --no-cache make automake gcc libc-dev subversion python3-dev libxslt-dev
RUN pip wheel --wheel-dir=/install/wheelhouse -r /requirements.txt
RUN pip install --no-index --find-links=/install/wheelhouse -r /requirements.txt
FROM base
COPY --from=builder /install /usr/local
COPY ./run.py .
COPY ./app /app
WORKDIR /
ENTRYPOINT ["python", "./run.py"]
I am not sure there is enough information in this thread to help triage. Layer caching is about order of operations and uniqueness of the files (content, file metadata, timestamps, etc) in each layer. For example, if the first layer in your Dockerfile is a binary that has a different timestamp every time it is compiled, caching is not going to be effective.
I would also point out that the docker build command should provide logs regarding if a layer is cached or not. This information is vital and has not been provided, which makes it difficult to help troubleshoot.
If you need them, I can provide the logs. When I look into them, I see that some layers are being pulled in and cached, but this does not work on the layer running the apt-get update && apt-get install. Maybe this has something to do with the updating of the package list, which might happen every few minutes? This seems to align with your comment on ‘content, file metadata, timestamps, etc’
In the meantime I’ve narrowed down the problem to 2 main topics, the time it takes to compile python packages on a pi and the fact that a previously performed compilations are discarded between pipeline runs. To solve this, I chose an approach mixing the use of a ‘wheelhouse’ in a baseline-image combined with multi-staged build, so that the commonly used packages are precompiled and stored in an image on my registry.
The strategy of a baseline image reduced my build time from >45 minutes to <12minutes on a Pi3B+. The repository is made public here: https://github.com/bitbeckers/python3.6debian_pi. Regarding caching this seems like the easiest route for me. Please feel free to comment on the repo provided, as it still a work in progress which I just wanted to share
as Brad suggested, the Docker build output can reveal what steps were cached, and which wasn’t. It will make this thread more focused too, we can talk about line numbers, etc.
The bellow sample output has a few lines that are crucial to see. Like line “docker-builder:70” and “docker-builder:89”
As a suggestion, can you try factoring your Dockerfile to a single stage first?
My hunch is that for a multi-stage Dockerfile to work with cache_from, you have to push each intermediate step to registry as well and pull it before each build. Making the image pull step longer, and the config more complicated. But first things first: logs, and single stage. If those work, you can weigh the multi-stage alternative.
Hi Laszlo, thank you for your input! And sorry for the late response, life got in the way.
Currently, I am indeed building in the first stage, pushing to registry and pulling in subsequent stage. As you discussed in your posts, this does come with some overhead because you keep pulling and pushing images, but in the case of compiling python on pi this might the preferred solution.
A single-stage file deployment is running as we speak, so I can produce the logs later today!
Hi there, I finally got around to producing the logs. You might be onto something, as the single-stage build does pull from the registry. The files and log are added at the bottom.
So maybe I’m getting ahead of things, but your suggestion to push each intermediate step sounds like a good solution! This will indeed make the pull step longer… If that’s the case, can there be any advantage compared to using a custom base python image? Or is this the weighing part?
This works, i.e. everything is pulled from the cache. Like @laszlocph mentioned, the image pull/push steps add more time and the config is slightly more complex. I may combine some stages that don’t need to be in separate stages.