This is happening on highly parallel builds where we will see Error response from daemon: unauthorized when the build is executing. Other build steps will run on this agent in the pipeline.
this error indicates the image did not exist in the local Docker cache, therefore the runner made an API request to the docker daemon on the host to pull the image. The Docker daemon responded to this API call with an unauthorized error. This means the registry rejected the request with 403 unauthorized. I recommend checking the docker daemon debug logs to see if docker provides more details regarding why the registry rejected the request to pull the image.
the reason the step was skipped, I presume, is because the pipeline is in an failed state. The reason the pipeline would be in a failed state is because one of the steps failed due to inability to pull the docker image. When the pipeline is in a failed state, by default, the remaining steps are skipped unless they were explicitly configured to execute on failure, using when: { status: [ failure ] }
so based on this screenshot, the root cause seems to be problems pulling a private image from a registry. I hope that helps.
this is certainly possible. One way to mitigate such issues is to use pull: if-not-exists [1] which instructs the system to always use the docker image in the local cache if it exists. If a private registry is rate limited or becomes unstable under heavy load, this could help reduce the number of requests and overall load.
We know for sure that Error response from daemon: unauthorized comes from Docker trying to pull the image Artifactory. Unfortunately we only know what Docker tells us, which is that the pull failed with a 403 Unauthorized error. The only root cause I am aware of that would result in a 403 Unauthorized is missing credentials or insufficient permissions for the credentials provided. I recommend contacting the Artifactory support folks who should be able help triage image pull issues, including how to enable more verbose logging to understand where this error comes from at the daemon or registry layer.
As an aside, I recall another team having issues with Artifactory and there were able to trace the issue back to the Artifactory server being overwhelmed:
It seems that setting proxy_max_temp_file_size 0; in our nginx conf that sits in front of artifactory has solved the problem. For very large docker images, nginx was caching all the layers to disk before sending them in the response, which would cause timeouts and errors.
Not sure if this is related but may be worth looking into.
Absolutely, We have that set in artifactory already. I just wish we could see an error in the docker events or artifactory. This has to be coming from the daemon though. I wonder what is causing it.