In a Centos 7 VM hosted VMSphere, we are seeing this issue intermittently when GitLab kicks off a drone v0.5 job and don’t know if it is an environment issue or something wrong in drone:
I’ve tried customizing the drone-git plugin the clones the repo https://github.com/drone-plugins/drone-git adding “example.dev” to the /etc/host of the container in the hopes this would resolve this issue. The issue hasn’t resurfaced yet, but I’m not holding out hope. because I question why I would need to do this.
Docker uses the /etc/resolv.conf of the host machine (where the docker daemon runs). While doing so the daemon filters out all localhost IP address nameserver entries from the host’s original file.
DNS resolution is configured at the docker daemon level. Docker uses the /etc/resolve.conf of the host machine, which means if this file is configured correctly on the host machine, the configuration will be copied into the container.
If you are having issues you might want to use Docker support. This is a Docker networking question and they will be best suited to help you troubleshoot.
I’m seeing exactly the same error with my Gogs/Drone setup on a QNAP device. I do have full DNS support for my Gogs endpoint, though, so I’m not playing any tricks with local custom names or docker networking tricks.
I thought I might be able to replicate the problem (and then figure out a solution) by manually creating the same situation that the “clone” step does: a plugins/git container using the default docker network, but just with a simple shell. To that end, I did:
docker run --rm -t -i --entrypoint "" plugins/git:latest /bin/sh
… which gives me a shell inside a container that should have exactly the same configuration as the “clone” step. From there, I tried nslookup gogs.mydomain.com, expecting to see a name resolution error… but no, it properly resolved the name! Since that seemed to be working, I tried the same steps that clone typically does, and things seemed to work up until git needed my username/password for the fetch. (Which is further along than the failed host resolution… it got far enough to know it needed my username/password!)
Based on the README for drone-plugins/drone-git, I tried the following:
docker run --rm \
-e DRONE_REMOTE_URL=https://gogs.mydomain.com/JaredReisinger/myrepo.git \
plugins/git
… and that also managed to properly resolve the Gogs host, and only failed because it needed my username/password.
So it appears that there’s something else going on with the clone step in the context of Drone, or that my manual repro steps aren’t actually replicating the same conditions as drone-server/drone-agent do. Any suggestions for next steps?
are you using custom networking drivers with docker? because drone creates a per-build network using the default bridge driver (just like docker-compose), and certain custom networking drivers (apparently) can interfere with the defaults.
No custom networking drivers. The default bridge network that docker-compose used when I spun up drone-server and drone-agent worked just fine, and allowed me to authenticate with Gogs, list repositories, etc. If there’s any diagnostic/forensic information I can collect, please let me know. (I’m really excited to try using drone; it looks really cool!)
Hi I have the same problem and it seems related to DNS resolving. How is DNS resolving set up for build containers? Is the /etc/resolv.conf from the host used?
I tried to reproduce the problem with this .drone.yml:
Which DNS servers are used in this case? When I run docker run --rm alpine cat /etc/resolv.conf I got the correct one (the one that is copied from the host as it seems) including the internal DNS server:
So maybe it is related to how the correct /etc/resolv.conf is propagated or not. I hope this helps. Is there a way to give drone through env vars a list of DNS servers that it should at least append to the /etc/resolv.conf it constructs, something like DRONE_DNS=<server1>,<server2> or something?
Hey, @lopsch… how does that work if the repo in which .drone.yml lives isn’t even being cloned? Or are you seeing a slightly different variation of this issue?
Nevermind! I just upgraded from drone 0.5 to 0.6, and I’ve seen how the clone pre-pipeline step gets configured. It’s amusing that the server/agent can access the repo (and thus .drone.yml), to bootstrap the pipeline, and yet the clone step itself can’t get anything! In any case, I’m making use of that to try to come up with some useful network diagnosis commands that I (and others) can use to understand what’s going on.
Reading up on Docker networks, I read the following about creating a user-defined bridge network (which is what I believe drone uses, both from comments here and from using docker network inspect [ID] while a build was running):
The containers you launch into this network must reside on the same Docker host. Each container in the network can immediately communicate with other containers in the network. Though, the network itself isolates the containers from external networks.
So, unlike the default bridge network, which has access to the outside world, user-defined bridge networks might not. But clearly, this works for some/most people, so there must be something else going on. I’ll keep reading to try to understand more.
Mhm yes maybe related to the network settings you stated. And maybe that’s why the /etc/resolv.conf isn’t being propagated correctly. Without containing the internal DNS server you won’t be able to resolve internal hostnames/IPs. I’ll try out some ideas and will come back later with the results.
Update 1:
I tried to ping the server directly via ping -c 3 192.168.10.176 in .drone.yml instead of ping -c 3 core-atlas-01 with the following result:
+ ping -c 3 192.168.10.176
PING 192.168.10.176 (192.168.10.176): 56 data bytes
64 bytes from 192.168.10.176: seq=0 ttl=64 time=0.324 ms
64 bytes from 192.168.10.176: seq=1 ttl=64 time=0.147 ms
64 bytes from 192.168.10.176: seq=2 ttl=64 time=0.101 ms
--- 192.168.10.176 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.101/0.190/0.324 ms
Yes seems network related. I assume the user defined network is firewalled (iptables) from the rest of the internal network(s) and not external (internet) networks. That’s why you can resolve and reach internet hosted git repositories like github. And a self hosted repository seems to be uncommon at least when it is not world reachable? Most people will be using github anyways. People using a self hosted bitbucket will also be using a self hosted bamboo I presume.
Ohohoh I can’t interpret my own findings. I CAN ping my internal networks from the user defined bridge network of the build containers so it only seems related to DNS resolving. Will try out adding the correct name server. If I find some time later I’ll rebuild the drone-git plugin for testing this.
Update 2: Replacing the /etc/resolv.conf with one containing the correct name server now works. So this should be the problem. At this point we need assistance from the developers who know how the /etc/resolv.conf is constructed for the build containers.
Messing around with the /etc/resolv.conf is strictly discouraged.
Update 3:
The embedded DNS server for user defined networks (in general not specially related to drone) seems to strip out not only localhost addresses as stated in the docs but all private IP addresses. To verify I created my own user defined network, added dns entries to /etc/docker/deamon.json and via docker run --dns=... and everytime all private address were removed. I’m not sure if it is a bug or wanted. My docker version is 17.05.0-ce.
Except I can’t seem to. My Gogs instance is a publicly-reachable DNS entry, and yet my drone-create UDN doesn’t seem to resolve it. I’m wondering if the host somehow knows that the public name is “itself”, and Docker is treating it as a localhost name, and thus removing it? I’ll see if I can resolve other addresses, like www.google.com.
Update: Nope, nslookup www.google.com on a container attached to a UDN bridge fails. This is definitely outside of the scope of Drone, and appears to be entirely a Docker networking issue.
More interesting tidbits: I created a UDN bridge (docker network create --driver bridge foo) to replicate the network created by drone for the build steps. Then I compared results between a container on the default bridge network and the UDN:
default bridge | UDN bridge
-------------------------------------------------------+------------------------
/etc/resolve.conf nameserver 192.168.0.1 | nameserver 127.0.0.11
nameserver 8.8.8.8 | options ndots:0
-------------------------------------------------------+------------------------
nslookup www.google.com fails (the 192.168 lookup?) | fails
followed by a successful |
lookup (using 8.8.8.8?) |
-------------------------------------------------------+------------------------
nslookup [my gogs server] fails (the 192.168 lookup?) | fails
followed by a successful |
lookup (using 8.8.8.8?) |
-------------------------------------------------------+------------------------
ping 8.8.8.8 succeeds | fails
Based on the last test, ping 8.8.8.8, it’s not just a DNS problem for me… it’s a fundamental networking issue. (Right now, I’m betting it has something to do with the incredibly complicated way QNAP sets things up in docker (a.k.a. Container Station). I don’t think they could have made it any more complicated!)
Update: I tried the same tests on the “drone_default” network created by docker-compose when I spun up drone-server and drone-agent. Its /etc/resolve.conf looks just like the UDN example above, but it behaves like the default bridge: nslookups and pings succeed. Looking at docker network inspect of “drone_default” and “foo”, they appear identical. This is so weird.
Actually… I take that back. In the inspect details, the “drone_default” shows:
Strange I can connect to internal networks but can’t do hostname resolution. Will continue investigation on the weekend.
I was able to solve this by using FQDN so instead of e.g. drone I used drone.home.lan. I think this is something related to my DNS server, maybe a misconfiguration.
We experienced the issue too while running drone in kubernetes cluster created with help of “kops” tool. The cluster was using “calico” for networking. It turned out that in such configuration docker is running with --iptables=false --ip-masq=false flags. In such mode docker can’t add masquerading rule for the user defined network into iptables, so packets from the networks are not NAT’ed. I’ve added the masquerading iptables rule manually and this resolved the issue.
Hi, guys!
I have same trouble. And i think, that it is not Docker bug. I create User Defined Network (UDN) and config docker-compose for using this network. Then i add gogs and simple ubuntu containers into this network.
Gogs and Ubuntu normally resolve all containers in the network. Drone server resolve gogs (as i think), because drone can connect to Gogs by hostname. But only drone agent can’t resolve gogs by hostname.
UDN use Embedded DNS, that resolves all containers by his name and network alias. It starts at all containers on 127.0.0.11 address (see oficial documentation of docker). And this IP present at /etc/resolve.conf
It’s normal documented action.
All my other containers work correctly. Only Drone.io agent fails.
But i noticed some interesting information in nslookup responce. May be drone agent think that it’s bad response?
I can’t show response because i have no drone containers at this time on my PC. But you can reproduce my case and see it yourself.
Or i can show it later.
P.S.: Sorry fo my bad English. I hope you will understand what i mean.