Caching your Git Clone

I was recently asked how one could go about caching a large git repository …

This thread will explore some different strategies for caching your git repository to reduce build time. We only recommend this for larger repositories (e.g. if you are only shaving a few seconds off your build time, the added complexity is not worth it).

I recommend creating a custom git plugin that clones the repository to the volume, and then clones or copies from the volume into your workspace. First lets configure our yaml file to use a custom plugin and volume, where the volume path is /tmp/git

+clone:
+  git:
+    image: my-custom-plugin
+    volumes: [ /tmp/git:/tmp/git ]

pipeline:
  build:
    image: golang
    commands:
      - go build
      - go test

We could then create a plugin that invokes a simple shell script that clones the repository to a cache first, and then clones into the workspace. (disclaimer: the example below is for demonstration purposes only and probably doesn’t work)

#!/bin/sh

# go to the shared volume and initialize the repository if necessary and clone
pushd /tmp/git
if [ ! -d .git ]; then
	git init
	git remote add origin ${DRONE_REMOTE_URL}
fi

# pull everything. normally this would be slow, but since the repository is cached
# it should be relatively quick.
git pull --all

# return to our working directory 
popd

# use the cached git repository (on disk) as the remote
git init
git remote add origin /tmp/git

# checkout the specific branch and commit into the current directory
git fetch origin +refs/heads/${DRONE_BRANCH}:
git checkout ${DRONE_COMMIT} -b ${DRONE_BRANCH}

We have a prototype git plugin based on bash (instead of Go) that you can probably use as a baseline. Take a look at https://github.com/drone-plugins/drone-git/tree/next

Some additional considerations:

Cleanup

One thing you may need to consider is cleanup. You will need to prune and remove old branches and objects from your cached repository. I recommend executing git prune on the repository cache as part of the script.

Locking

You also need to consider what will happen if two builds run at the same time and try to update the cache, which could be problematic. I would recommend some sort of system lock, but you need to be careful because you want a lock that will get released when the process is destroyed to avoid something locking indefinitely.

I feel like you could use a unix file socket as a lock, but I’m not entirely sure. I would have to think through this a bit more and run some tests.

1 Like

Seems like the syntax doesn’t work for 1.0?

no, the final 1.0 syntax changed significantly from previous versions. You can still achieve something similar with 1.0 once you have a base understanding of how it works
https://docs.drone.io/user-guide/pipeline/cloning/
https://docs.drone.io/user-guide/pipeline/volumes/

Thank you so much. Is there a way to inject git credentials to the custom clone step?

Realize the git credentials seems injected already. Sorry.

It would be super useful to have this built in as the default. I’m working on a large repository that takes about 60 seconds to clone :frowning: