Monorepo taxing git clones

hello,
I’m looking for some ideas, I have this monorepo comprised of 15+ subfolders and I have steps for each one of those, with some logic for skipping the build if no file has been changed in each folder, based on git. Drone is on kubernetes.

the problem is that each job will git clone the original repo and this is becoming very taxing on the build system, I’ve reached a point where it takes 10 full minutes for each parallel job just to clone the repo, so I’m looking for solutions.

  1. clone with depth

this is not bad but given that we have some logic to skip steps for subfolder it happened that there were not enough history for this logic to work

  1. sparse checkout

I still have to try this but I’m having trouble understanding if this solution fits, given that recent versions of git have sparse checkout builtin and there’s a drone plugin, still not sure

  1. keep a git mirror around

so I was thinking that maybe I should keep an up to date git mirror somewhere in the cluster near drone, but I’m not finding much info, plus I think that once drone received the webhook it wants to go and clone the original github repo. I think I saw a git-sync utility that can be used as a sidecar to some pods but I’m having trouble fitting a solution in this picture

so I did not found any good solution to this, I’m really trying to cut down build times because a real build + image push for some folders is a matter of 2 minutes, the remaining 18 are just for setup around the build (clone, cache pull, cache push, etc.)

I’m looking for some ideas, I have this monorepo comprised of 15+ subfolders and I have steps for each one of those, with some logic for skipping the build if no file has been changed in each folder, based on git. Drone is on kubernetes.

Have you considered using an extension to automatically skip pipeline stages entirely baesd on files changed? There are a number of extensions you can setup that have been designed for this purpose. These are the most popular:

https://github.com/bitsbeats/drone-tree-config
https://github.com/meltwater/drone-convert-pathschanged

so I was thinking that maybe I should keep an up to date git mirror somewhere in the cluster near drone, but I’m not finding much info, plus I think that once drone received the webhook it wants to go and clone the original github repo.

Cloning is optional and can be disabled (and optionally replaced with custom logic). Drone is not going to solve this problem for you natively, however, it should expose all the necessary primitives that would allow you to solve the problem. For example, you could disable the default clone step and then add a step that retrieves your code from another location (you could use simple shell commands or even create a plugin). You could even look into using volumes, keeping in mind that you would need handle potential race conditions from multiple pipelines attempting to access the same volume at the same time.

thanks Brad, based on what you wrote I might have an additional idea, but first I need to write a step that does the cloning and I would like to follow the current drone logic closely.

I tried to check the source code to understand where and how the exact commands are issued but I’m not seeing it – can someone please point me in this area of code?

oh nevermind I think I’ve found it drone-git/posix at master · drone/drone-git · GitHub

dang that’s complicated, maybe it’s easier to add a path option to decide where to clone

I mean, if I can clone/fetch in a host volume folder instead of a default /drone/src/.git/ maybe I have a chance

update: just found out DRONE_WORKSPACE, will play with it