Wednesday, 4 January 2017

Weird new misbehavior on Zesty (schroot/sbuild DNS)

I've been chasing a new misbehavior I first noticed just before the holidays,
but didn't get a chance to investigate until now. I'd file a bug, but I'm
really not sure where the problem is yet. I'm not even positive that it's a
bug in the software stack or in my LAN or other local environment, although I
strongly suspect the former.

I've been able to reproduce it twice now, so let's see if I can explain what's
going on and whether anyone else has seen it, or has ideas where to look. I
couldn't find any related reports on BTS or Launchpad, but of course it's
always possible I've missed something.

I use sbuild/schroot to build and test packages. On Ubuntu, these
instructions are an easy recipe for installing that environment:

https://wiki.ubuntu.com/SimpleSbuild

This has worked beautifully for years, and even worked fine on Zesty until
recently, some time just before Christmas. I do have a local apt-cacher-ng
instance on my LAN and I've always manually added an /etc/apt/apt.conf.d/ file
to proxy apt/apt-get to my local apt-cacher-ng. I do this both on my actual
desktops and inside the chroots, and again this has all worked beautifully
until now.

One interesting new development is that mk-sbuild appears to copy any host
system /etc/apt/apt.config.d proxy settings into the schroots so now the
manual addition of this file isn't necessary. (I haven't updated the wiki page
yet, but will do that once this problem is solved.) mk-sbuild does rewrite
the proxy to a more canonical form though. Where I have on my host (in
a file called 02proxy):

Acquire::http { Proxy "http://apt.my.dom:3142" }

mk-sbuild writes this inside the schroot as 99mk-sbuild-proxy:

Acquire { HTTP { Proxy "http://apt.my.dom:3142"; }; };

That's fine and not the problem as either syntax works.

The problem is that on my LAN apt.my.dom is a CNAME to the machine running
apt-cacher-ng. Let's say that's 'counterparts' so the A record is
counterparts.my.dom

Therein lies the problem. Inside the schroot, the CNAME resolution doesn't
work so the first time I noticed a problem was when `apt update`ing in the
schroots (via my chup script linked off the SimpleSbuild wiki page):

# apt update
Err:1 http://archive.ubuntu.com/ubuntu zesty InRelease
Could not resolve 'apt.my.dom'
Err:2 http://security.ubuntu.com/ubuntu zesty-security InRelease
Could not resolve 'apt.my.dom'
Err:3 http://archive.ubuntu.com/ubuntu zesty-proposed InRelease
Could not resolve 'apt.my.dom'
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/zesty/InRelease Could not resolve 'apt.my.dom'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/zesty-proposed/InRelease Could not resolve 'apt.my.dom'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/zesty-security/InRelease Could not resolve 'apt.my.dom'
W: Some index files failed to download. They have been ignored, or old ones used instead.

Well that's odd and has never happened before, so let's investigate a little
more. Still inside the schroot:

# host apt.my.dom
apt.my.dom is an alias for counterparts.my.dom.
# host counterparts.my.dom
counterparts.my.dom has address 192.168.1.12

So far so good. But then:

# ping apt.my.dom
ping: apt.my.dom: Name or service not known
# telnet apt.my.dom 3142
telnet: could not resolve apt.my.dom/3142: Name or service not known
# ping counterparts.my.dom
PING counterparts.my.dom (192.168.1.12) 56(84) bytes of data.
64 bytes from counterparts.my.dom (192.168.1.12): icmp_seq=1 ttl=64 time=0.481 ms
^C
--- counterparts.my.dom ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.481/0.481/0.481/0.000 ms
# telnet counterparts.my.dom 3142
Trying 192.168.1.12...
Connected to counterparts.my.dom.
Escape character is '^]'.
^]
telnet> Connection closed.

Ah, so some tools are having trouble resolving the CNAME, but not the A.
Indeed, if I change the apt.conf.d proxy file to be:

// proxy settings copied from mk-sbuild
Acquire { HTTP { Proxy "http://counterparts.my.dom:3142"; }; };

then `apt update` and friends are also happy.

Outside of the schroot (i.e. in the host), there's absolutely no problem
resolving the CNAME. Pinging and telnet'ing to apt.my.dom works fine, as does
apt and apt-get using the same proxy file. Both /etc/resolv.conf inside and
outside the chroot point to 127.0.0.53 and systemd-resolve --status seems fine
(outside the schroot; there's no such command atm inside the schroot).

But clearly *most* DNS queries to the LAN work fine inside the schroot because
I can talk to any A name I can think of. It's just CNAMEs that are broken.

So, is this a bug in systemd? Is it a bug in the schroot? Is it some problem
with my LAN?

A few other data points:

All of my Zesty machines are currently VMs with bridged networking, but this
problem happens with both a Fusion 8.5 VM on an OS X 10.11.6 host and in a
QEMU/KVM on a Yakkety host. I have yet to try it on native hardware or
unbridged networking.

With exactly the same setups, all of this still works fine for any schroot on
a Yakkety host.

I recently installed a new router which blocks internally unknown hosts from
connection outside the LAN, but I don't think that's related as I've
whitelisted both VMs IPv4 LAN (i.e. not NAT'd) addresses, and the router is
reporting the expected MACs and IPs.

I'm suspecting something around systemd or recent Zesty networking changes.

Any and all suggestions are welcome.

Cheers,
-Barry