Wednesday, 4 January 2017

Weird new misbehavior on Zesty (schroot/sbuild DNS)

I've been chasing a new misbehavior I first noticed just before the holidays,
but didn't get a chance to investigate until now. I'd file a bug, but I'm
really not sure where the problem is yet. I'm not even positive that it's a
bug in the software stack or in my LAN or other local environment, although I
strongly suspect the former.

I've been able to reproduce it twice now, so let's see if I can explain what's
going on and whether anyone else has seen it, or has ideas where to look. I
couldn't find any related reports on BTS or Launchpad, but of course it's
always possible I've missed something.

I use sbuild/schroot to build and test packages. On Ubuntu, these
instructions are an easy recipe for installing that environment:

This has worked beautifully for years, and even worked fine on Zesty until
recently, some time just before Christmas. I do have a local apt-cacher-ng
instance on my LAN and I've always manually added an /etc/apt/apt.conf.d/ file
to proxy apt/apt-get to my local apt-cacher-ng. I do this both on my actual
desktops and inside the chroots, and again this has all worked beautifully
until now.

One interesting new development is that mk-sbuild appears to copy any host
system /etc/apt/apt.config.d proxy settings into the schroots so now the
manual addition of this file isn't necessary. (I haven't updated the wiki page
yet, but will do that once this problem is solved.) mk-sbuild does rewrite
the proxy to a more canonical form though. Where I have on my host (in
a file called 02proxy):

Acquire::http { Proxy "" }

mk-sbuild writes this inside the schroot as 99mk-sbuild-proxy:

Acquire { HTTP { Proxy ""; }; };

That's fine and not the problem as either syntax works.

The problem is that on my LAN is a CNAME to the machine running
apt-cacher-ng. Let's say that's 'counterparts' so the A record is

Therein lies the problem. Inside the schroot, the CNAME resolution doesn't
work so the first time I noticed a problem was when `apt update`ing in the
schroots (via my chup script linked off the SimpleSbuild wiki page):

# apt update
Err:1 zesty InRelease
Could not resolve ''
Err:2 zesty-security InRelease
Could not resolve ''
Err:3 zesty-proposed InRelease
Could not resolve ''
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
W: Failed to fetch Could not resolve ''
W: Failed to fetch Could not resolve ''
W: Failed to fetch Could not resolve ''
W: Some index files failed to download. They have been ignored, or old ones used instead.

Well that's odd and has never happened before, so let's investigate a little
more. Still inside the schroot:

# host is an alias for
# host has address

So far so good. But then:

# ping
ping: Name or service not known
# telnet 3142
telnet: could not resolve Name or service not known
# ping
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=64 time=0.481 ms
--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.481/0.481/0.481/0.000 ms
# telnet 3142
Connected to
Escape character is '^]'.
telnet> Connection closed.

Ah, so some tools are having trouble resolving the CNAME, but not the A.
Indeed, if I change the apt.conf.d proxy file to be:

// proxy settings copied from mk-sbuild
Acquire { HTTP { Proxy ""; }; };

then `apt update` and friends are also happy.

Outside of the schroot (i.e. in the host), there's absolutely no problem
resolving the CNAME. Pinging and telnet'ing to works fine, as does
apt and apt-get using the same proxy file. Both /etc/resolv.conf inside and
outside the chroot point to and systemd-resolve --status seems fine
(outside the schroot; there's no such command atm inside the schroot).

But clearly *most* DNS queries to the LAN work fine inside the schroot because
I can talk to any A name I can think of. It's just CNAMEs that are broken.

So, is this a bug in systemd? Is it a bug in the schroot? Is it some problem
with my LAN?

A few other data points:

All of my Zesty machines are currently VMs with bridged networking, but this
problem happens with both a Fusion 8.5 VM on an OS X 10.11.6 host and in a
QEMU/KVM on a Yakkety host. I have yet to try it on native hardware or
unbridged networking.

With exactly the same setups, all of this still works fine for any schroot on
a Yakkety host.

I recently installed a new router which blocks internally unknown hosts from
connection outside the LAN, but I don't think that's related as I've
whitelisted both VMs IPv4 LAN (i.e. not NAT'd) addresses, and the router is
reporting the expected MACs and IPs.

I'm suspecting something around systemd or recent Zesty networking changes.

Any and all suggestions are welcome.