Tuesday 7 June 2016

Re: ANN: DNS resolver changes in yakkety

On Tue, Jun 07, 2016 at 10:16:37AM +0200, Martin Pitt wrote:
> Hello all,
>
> Stéphane Graber [2016-06-06 12:27 -0400]:
> > > There's a thread here on Ubuntu and systemd-resolved:
> > > https://lists.dns-oarc.net/pipermail/dns-operations/2016-June/014964.html
>
> I skipped over the first bunch of noise, I'm now in the middle of it
> where some actual meat comes in. Ondřej contacted me privately last
> week about it already, and said that he'll follow up with some
> consolidated and objective criticism here. Much appreciated!
>
> > - Anything which doesn't use the C library resolving functions, which
> > would include any static binary bundling its own copy of those, will
> > fallback to /etc/resolv.conf and not get split DNS information or the
> > desired fallback mechanism.
>
> This isn't new, though. Anything not using NSS has behaved differently
> all the time already, such as not doing LLMNR (libnss-mdns4, for
> resolving names in the local network). This mainly affects tools like
> "dig", but these probably have a good reason to do things by
> themselves.
>
> > This is likely to affect a whole bunch of Go binaries and similar
> > statically built piece of software. It will also, probably more visible
> > affect web browsers who have recently all switches to doing their own
> > DNS resolving.
>
> Being statically built doesn't mean that these programs can't/don't
> use libc or can't do NSS (it's done via dlopen() anyway, not via
> shared libs). So I take it Go's runtime library doesn't currently do
> this then?

Go can build in two modes depending on version and flags:
- Completely static mode, in which case there isn't a single line of
glibc being used in the binary. That means that the resolver code just
parses /etc/resolv.conf and use those servers directly.

- Static for everything except for gethostbyname & getaddrinfo which
are called through the C library using cgo. That's the default when
building Go on Ubuntu, with our version of Go anyway.
Go folks however do have a tendency to ship pre-built binaries and
all bets are off on those.

Then there is the concern brought by Scott that even in languages which
don't encourage shipping static binaries that aren't using C library
functions, there are a number of commonly used libraries which do DNS
lookups directly (python-dns).

Those libraries usually do such lookups directly rather than through NSS
(or through whatever new API systemd came up with) because they want to
get the raw reply. This is for example needed if you want to know the
value of a DNS record even if its DNSSEC validation failed (while also
knowing that it failed).

> > - This breaks downstream DNSSEC validation. Mail servers and some web
> > browsers require the ability to read the DNSSEC validation result from
> > the DNS reply. Those therefore don't use the libc resolving functions
> > and instead do the DNS request themselves, they'd then fall into the
> > above problem where they'd use /etc/resolv.conf and miss any split DNS
> > or similar configuration done inside resolved.
>
> This is essentialy the same issue as above indeed.
>
> > - Some concerns about it broadcasting queries to all DNS servers rather
> > than just the one it's supposed to use for a given domain. Hopefully
> > this was just mis-configuration and not how resolved actually works, as
> > this would be a pretty big privacy issue.
>
> Right, this is a current bug, I filed it as
> https://github.com/systemd/systemd/issues/3421 .
>
> > - Not having resolved offer a DNS service itself means we can't
> > properly daisy-chain our other DNS/DHCP servers like the dnsmasq
> > instances we use for LXC, LXD and libvirt. That means that the
> > containers and virtual machines will not be getting the same DNS view as
> > the host, being only restricted to hitting the servers in the host
> > /etc/resolv.conf without any awareness of split view DNS.
> >
> > Unless the above can be fixed somehow, and I very much doubt resolved
> > will grow a DNS server any time soon,
>
> There actually was talk about it, but I don't think that'd be
> unambiguously good, see below.
>
> > ... the switch to resolved mostly feels like a regression over the
> > existing resolvconf+dnsmasq setup we've got right now and which in
> > my experience at least, has been working pretty well for us.
>
> Note that this is *only* a setup on desktops with NetworkManager. On
> servers, cloud instances etc. we haven't set up any local DNS server;
> thus we have bad handling multiple servers with failures, no way to do
> split DNS or DNSSEC. The same is true for interfaces that get managed
> through ifupdown instead of NM. These issues are what libnss-resolve
> is supposed to address. So "working pretty well" is a bit of an
> overstatement here :-)

So, minus the security problems that have been mentioned so far, I can't
think of any major problems with using resolved on servers.

We'll definitely want to make sure that it doesn't start in containers
by default as that'd significantly increase the process count for no
good reason on systems with hundreds or thousands of containers.

And it'd be nice if there was a way to only have resolved run when we
have multiple DNS servers as otherwise, with caching disabled (and I
suspect we will turn it off), it'd just make things slower.

> I also doubt that we actually do want to install dnsmasq on servers,
> as this would then logically conflict with "real" DNS servers like
> bind, pdns, unbound etc. that server admins commonly install. There
> isn't even a mutual package conflict etc. This is why I think an NSS
> module is much more appropriate on servers (and even desktops), as
> this does not prevent you from installing a local DNS server if you
> want/need to. The same is true in principle for desktops, although I
> don't think that we should enable *both* resolved and dnsmasq as we'd
> then have two processes where one would suffice.

Yeah, we certainly don't want dnsmasq running on servers.

> So in summary, I don't get any tributes by introducing resolved :-),
> and if the server team is happy with installing dnsmasq by default,
> we can configure that on servers, too. We would then introduce the
> above conflict with real servers and don't have DNSSEC, so that's a
> balancing that we'll have to do.

By the way, dnsmasq does support DNSSEC.

As far as conflicts between dnsmasq and an existing resolver, that's why
we run dnsmasq on 127.0.1.1 on the desktop, so that it doesn't conflict
with something else that'd be binding 127.0.0.1.

> I'm quite convinced that we should find a solution which works on
> servers, VMs, and desktops alike, and until Xenial we haven't had
> that.

And so long as having a common solution can be done without regressions
and without hand wavy answers like "web browsers will just have to
switch to some new systemd DBUS API", I don't mind the change.



In order to reach your goal without breaking anything, I suspect we'd
either need to have resolved offer a local DNS server which can be put
ahead of everything else in /etc/resolv.conf, similar to what we do with
dnsmasq.

Or we should just keep dnsmasq running on the desktops. Most likely by
teaching resolved to not bother to start if /etc/resolv.conf contains
127.0.1.1.

Expecting everything which does direct DNS queries (and usually with
reason) to switch to some new systemd API which may or may not offer the
information they need, seems a bit unrealistic to me, especially if we
expect all that to happen by the end of this cycle.

--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com