A possible workaround is using sleep as mentioned in the ticket below, but that requires committing code to ensure we sleep before a failure. You also have to know the failing step in advance to add the sleep before it.
the drawbacks you listed are valid points, although I’m not sure about this one specifically. what sort of caching are you referring to? or is the drawback that you need to execute every step? If the latter, you can skip directly to the step you want to debug using the --include or --exclude flags, show here https://docs.drone.io/quickstart/cli/#run-specific-steps
we could add a flag to the CLI that freezes the pipeline when a step fails and then drops you directly into the container with a command prompt. This would eliminate both concerns – you would not need to sleep and you would not need to know which yaml step fails ahead of time.
the drawbacks you listed are valid points, although I’m not sure about this one specifically. what sort of caching are you referring to?
Imagine the following use case:
I’m working on a large python project.
For whatever reason my local tests succeed, but the build fails on the CI server. I want to debug the failure.
Using drone exec I can run the build locally, but that would require installing all my dependencies (which are already cached by a drone plugin on the remote server). For large projects this can take a very long time. Ideally I want to drop into the container and quickly debug what has gone wrong.
we could add a flag to the CLI that freezes the pipeline when a step fails and then drops you directly into the container with a command prompt.
This is a cool idea, but I think an even better idea is to:
Configure the repository so that it freezes failing builds (e.g leaves containers running for the Docker executor).
Add a drone cli command to drop into remote containers on demand. This would remove the need to setup secrets locally and removes the caching so that I don’t need to install large projects on my laptop.
The challenge is we don’t know if the container fails until it exists (using the exit code) and once the container exits we can no longer exec into it. We wouldn’t want to restart because that would run all the commands again. We would therefore have to redesign some aspects of our system to work around these issues. We may also require different solutions for different runners. For example, the docker runner may need a different solution than the kubernetes runner or the macstadium runner.
I agree this would definitely be a cool feature to have. I just wanted to suggest some workarounds that you could use right away for debugging. This is a complex feature and could some time to introduce to Drone. We can certainly add this feature request to our backlog.
You could mount the dependencies from your host into the container to avoid having to re-download the dependencies. For example:
Clever workaround. I’ll give it a try.
The challenge is we don’t know if the container fails until it exists (using the exit code) and once the container exits we can no longer exec into it. We wouldn’t want to restart because that would run all the commands again.
For the docker runner in particular the dead container should still exist on the filesystem. Can’t you run it again with the same parameters it was originally started with, but override the entrypoint to be /bin/sh?
This is a complex feature and could some time to introduce to Drone. We can certainly add this feature request to our backlog.
Understood. If some of my own time clears up at the end of the year I’ll try to put together a PoC for the docker runner.
you cannot change the entrypoint of an existing container. You have to create a new container. I believe concourse use docker commit to snapshot a new image from the container, so that it can create a new container with a new entrypoint. This approach would be specific to docker pipelines, and would not work with our other pipeline types (kubernetes, macstadium, etc).
Travis uses tmate.io which seems like a more generic solution. Travis primarily uses VMs and they have tmate pre-installed in these VMs. This is something that would have to be worked around in Drone, but overall seems like a more promising and generic approach that would work for across runtimes (docker, kubernetes, macstadium, etc).
Travis uses tmate.io which seems like a more generic solution. Travis primarily uses VMs and they have tmate pre-installed in these VMs. This is something that would have to be worked around in Drone, but overall seems like a more promising and generic approach that would work for across runtimes (docker, kubernetes, macstadium, etc).
Agreed
On another note I’ve used Drone for a few days now. Caching and debugging was painful to get working (especially for Python which normally uses system installed dependencies). There is a lack of code examples for caching in various languages. I’ll document my findings to simplify the adoption for python users.
I appreciate how simple and well thought out Drone is - there is very little magic under the hood. With a good understanding of containers it becomes easy to understand what Drone is doing. The UI is slick and simple without adding too much cruft that other CI products have.
@bradrydzewski is it possible to make drone exec enter an interactive container mode on a failed run?
If I have a multiple series of commands I would like to preserve those environment variables for debugging.
- export COMPOSE_PROJECT_NAME=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)
- docker-compose up -d db
- sleep 1000000
Running docker exec to enter the container after the commands above will not have the COMPOSE_PROJECT_NAME variable in the environment, so you have to re-export them when debugging.
Additionally - when running in docker-in-docker and starting sibling containers - is it possible to clean them up at the end of a pipeline regardless of success or failure?
Not at this time, however, I started playing around with tmate yesterday and have a working proof-of-concept. The ability to leverage tmate is going to make this feature much easier to implement than we originally estimated.
This should probably be split out to a separate topic, so we can keep this thread focused on interactive debug.