I was recently asked how one could go about caching a large git repository …
This thread will explore some different strategies for caching your git repository to reduce build time. We only recommend this for larger repositories (e.g. if you are only shaving a few seconds off your build time, the added complexity is not worth it).
I recommend creating a custom git plugin that clones the repository to the volume, and then clones or copies from the volume into your workspace. First lets configure our yaml file to use a custom plugin and volume, where the volume path is /tmp/git
+clone:
+ git:
+ image: my-custom-plugin
+ volumes: [ /tmp/git:/tmp/git ]
pipeline:
build:
image: golang
commands:
- go build
- go test
We could then create a plugin that invokes a simple shell script that clones the repository to a cache first, and then clones into the workspace. (disclaimer: the example below is for demonstration purposes only and probably doesn’t work)
#!/bin/sh
# go to the shared volume and initialize the repository if necessary and clone
pushd /tmp/git
if [ ! -d .git ]; then
git init
git remote add origin ${DRONE_REMOTE_URL}
fi
# pull everything. normally this would be slow, but since the repository is cached
# it should be relatively quick.
git pull --all
# return to our working directory
popd
# use the cached git repository (on disk) as the remote
git init
git remote add origin /tmp/git
# checkout the specific branch and commit into the current directory
git fetch origin +refs/heads/${DRONE_BRANCH}:
git checkout ${DRONE_COMMIT} -b ${DRONE_BRANCH}
We have a prototype git plugin based on bash (instead of Go) that you can probably use as a baseline. Take a look at https://github.com/drone-plugins/drone-git/tree/next
Some additional considerations:
Cleanup
One thing you may need to consider is cleanup. You will need to prune and remove old branches and objects from your cached repository. I recommend executing git prune on the repository cache as part of the script.
Locking
You also need to consider what will happen if two builds run at the same time and try to update the cache, which could be problematic. I would recommend some sort of system lock, but you need to be careful because you want a lock that will get released when the process is destroyed to avoid something locking indefinitely.
I feel like you could use a unix file socket as a lock, but I’m not entirely sure. I would have to think through this a bit more and run some tests.