Docker-build in Helm drone-runner-docker fails to install requirements from pip, possibly due to cgroups

I’ve migrated my Drone setup (previously working as manually-managed Docker containers) to a Helm-configured setup (on an arm64 cluster), following instructions here. My attempts to build a package which previously succeeded now consistently fail at the point of installing Python dependencies with pip, with error messages indicating an inability to connect to pypi.org

Package’s .drone.yml here, and Helm deployment configuration (Chart.yaml and values.yaml) here.

Final relevant lines of the console logs for step push-built-image (full output here)

...
Step 4/13 : RUN pip3 install -r requirements.txt

 ---> Running in ec8fb30290c9

time="2023-01-10T03:09:02.154309917Z" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"

time="2023-01-10T03:09:02.154374953Z" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"

time="2023-01-10T03:09:02.437113869Z" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/ec8fb30290c93a84e562e6143980d4decd50db1e44a096afcdf9c1f2ba2ab86c pid=343

time="2023-01-10T03:09:03.625828253Z" level=error msg="failed to enable controllers ([cpuset cpu io memory pids])" error="failed to write subtree controllers [cpuset cpu io memory pids] to \"/sys/fs/cgroup/docker/cgroup.subtree_control\": write /sys/fs/cgroup/docker/cgroup.subtree_control: no such file or directory"

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/flask/

WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/flask/

WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/flask/

WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/flask/

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/flask/

ERROR: Could not find a version that satisfies the requirement flask (from versions: none)

ERROR: No matching distribution found for flask

WARNING: There was an error checking the latest version of pip.

time="2023-01-10T03:10:52.871600710Z" level=info msg="ignoring event" container=ec8fb30290c93a84e562e6143980d4decd50db1e44a096afcdf9c1f2ba2ab86c module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

time="2023-01-10T03:10:52.872122536Z" level=info msg="shim disconnected" id=ec8fb30290c93a84e562e6143980d4decd50db1e44a096afcdf9c1f2ba2ab86c

time="2023-01-10T03:10:52.872242516Z" level=warning msg="cleaning up after shim disconnected" id=ec8fb30290c93a84e562e6143980d4decd50db1e44a096afcdf9c1f2ba2ab86c namespace=moby

time="2023-01-10T03:10:52.872458920Z" level=info msg="cleaning up dead shim"

time="2023-01-10T03:10:52.897693342Z" level=warning msg="cleanup warnings time=\"2023-01-10T03:10:52Z\" level=info msg=\"starting signal loop\" namespace=moby pid=402\n"

The command '/bin/sh -c pip3 install -r requirements.txt' returned a non-zero code: 1

exit status 1

I’m not sure whether failed to enable controllers is relevant to this inability to call pypi.org. Some other Google results indicate that this might be due to a mismatch in cgroups versions. This issue also relates issues with sys/fs/cgroup/cgroup.subtree_control in Docker-in-Docker to cgroups version issues (though note the path difference - my error logs have /docker/ in there). I’ve found threads on this forum that have different cgroups issues (such as here), but no simple solution. This thread from 2021 advises rebuilding the Docker image manually - so I’m posting this question in the hopes that someone else has a more-direct solution while I pursue that.

EDIT: Acknowledging that there is guidance in the Docker Runner to ensure that the MTU is appropriately set. I am reasonably sure that the default MTU for k3s is 1500 (as per here), and furthermore I would have expected failures due to this to have occurred earlier than when attempting to contact pypi.org (for instance, when git cloneing)

EDIT2: Also confirming that, with a shell to the dind container with kubectl exec -it ..., I’m able to curl https://pypi.org/simple/flask (after apk add curl). I wasn’t able to get a shell on the drone-runner container because ...exec: "sh": executable file not found in $PATH...

@scubbo I would look more into the MTU.

Recently we saw similar problems when creating an environment for a workshop. You may have to set com.docker.network.driver.mtu as we did here: drag-stack/drone-runner-docker-app.yaml at 3c7a3e846e802add00cfb3bcbbeef79738e0f7df · harness-apps/drag-stack · GitHub

The docker runner creates its own temporary docker networks, those networks must have a smaller, or equal MTU compared to the outer network layer(s).

We set it to 1410 drag-stack/values.yaml at 3c7a3e846e802add00cfb3bcbbeef79738e0f7df · harness-apps/drag-stack · GitHub

I attempted to document MTU here charts/install.md at master · drone/charts · GitHub

Let me know if you feel it can be improved.

No luck, unfortunately - I’ve set MTU=1450 in the 3 locations (dind.commandArgs, DRONE_RUNNER_NETWORK_OPTS, and settings.mtu in the .drone.yml), having confirmed with ip addr | grep mtu that all networks on the host have MTU 1500, but still getting the same error.

WHOOPS, my bad - turns out I’d incorrectly set the MTU in my pipeline as:

steps:
  - name: step-name
    image: plugins/docker
    settings:
      registry: ...
      repo: ...
      ...
      settings:
        mtu: 1450

That is - I added an extra (unnecessary) level of nesting named settings. I fixed that, and all’s well!