I have the autoscaler process running on EC2 instance which is launched by following command:
sudo docker run --rm \
-v /home/ubuntu/autoscaler-data:/data \
-e DRONE_POOL_MIN=2 \
-e DRONE_POOL_MAX=15 \
-e DRONE_SERVER_PROTO=https \
-e DRONE_SERVER_HOST=<drone.domain> \
-e DRONE_SERVER_TOKEN=<SECRET> \
-e DRONE_AGENT_TOKEN=<SECRET> \
-e DRONE_AMAZON_INSTANCE=m4.4xlarge \
-e DRONE_AMAZON_REGION=us-west-1 \
-e DRONE_AMAZON_SUBNET_ID=<SECRET>\
-e DRONE_AMAZON_SECURITY_GROUP=<SECRET> \
-e DRONE_AMAZON_SSHKEY=<KEY_FOR_SSH> \
-e AWS_ACCESS_KEY_ID=<SECRET> \
-e AWS_SECRET_ACCESS_KEY=<SECRET> \
-e DRONE_AGENT_CONCURRENCY=15 \
-e DRONE_LOGS_DEBUG=true \
-e DRONE_INTERVAL=30s \
-e DRONE_AMAZON_VOLUME_SIZE=150 \
-e DRONE_POOL_MIN_AGE=60m \
-e DRONE_ENABLE_REAPER=true \
-e DRONE_REAPER_INTERVAL=30s \
-e DRONE_AGENT_IMAGE=drone/drone-runner-docker:latest \
-p 8080:8080 \
--name=autoscaler \
drone/autoscaler
My pipeline has a “test” step that executes a script to run the test cases. This step uses a docker-compose.yml file to create 11 containers (main application, DB, mail etc.)
This step finishes fine most of the time but every 4-5 builds, its failing with EXIT CODE 137.
When pipeline tries to access the container logs, it gives the following error:
Error response from daemon: can not get logs from container which is dead or marked for removal
Pipeline define something like this in my .drone.jsonnet file:
local test() = {
"name": "test",
"image": "private_image",
"volumes": [
{
"name": "docker_socket",
"path": "/var/run/docker.sock"
}
],
"environment": {
...
...
},
"commands": [
"export DRONE_COMMIT_SHA=${DRONE_COMMIT_SHA}",
"export COMPOSE_ID=`shuf -i 1000-9999 -n 1`",
"export RESULTS_DIR=/tmp/$${COMPOSE_ID}/test_results",
"echo $${COMPOSE_ID}",
"$(aws ecr get-login --region us-west-1 --no-include-email)",
"docker-compose --log-level DEBUG -p $${COMPOSE_ID} -f docker-compose-test.yml up -d --quiet-pull",
"echo \"AFTER_UP \" $(( SECONDS - start ))",
"docker-compose -p $${COMPOSE_ID} -f docker-compose-test.yml exec -T webapp mkdir /tmp/test_results",
"docker-compose -p $${COMPOSE_ID} -f docker-compose-test.yml exec -T webapp touch /tmp/test_results/test_results.xml",
"docker-compose -p $${COMPOSE_ID} -f docker-compose-test.yml exec -T webapp touch /tmp/test_results/e2e_test_results.xml",
"echo STARTING TEST EXECUTION",
"./drone_run_tests.sh /code/run_tests.sh",
"echo Destroy everything",
"docker-compose -p ${COMPOSE_ID} --log-level ERROR -f docker-compose-test.yml down -v --rmi local > /dev/null 2>&1"
],
"depends_on": depends_on
};
[
{
"kind": "pipeline",
"type": "docker",
"name": "pipeline_name",
"trigger": {
"event": [
"push"
],
"branch": [
"github_branch_name"
],
},
"workspace": {
"path": "/app/src"
},
"volumes": [
{
"name": "docker_socket",
"host": {
"path": "/var/run/docker.sock"
}
},
{
"name": "node_modules",
"host": {
"path": "/drone/cache/node_modules"
}
}
],
"environment": {
....
....
},
"steps": [
test()
]
}
]
Whenever it fails, it doesn’t have any logs which can tell me what killed the containers. I have checked the EC2 instance thats running autoscaler agent and everything (in terms of memory, cpu) looks file there. Any pointers to fix this?>