Tuesday, 31 May 2016

Re: ANN: DNS resolver changes in yakkety

On Tue, May 31, 2016 at 11:23:01AM -0400, Stéphane Graber wrote:
> On Tue, May 31, 2016 at 11:34:41AM +0200, Martin Pitt wrote:
> > Hello all,
> >
> > yesterday I landed [1] in Yakkety which changes how DNS resolution
> > works -- i. e. how names like "www.ubuntu.com" get translated to an IP
> > address like
> >
> > Until now, we used two different approaches for this:
> >
> > * On desktops and touch, NetworkManager launched "dnsmasq" configured
> > as effectively a local DNS server which forwards requests to the
> > "real" DNS servers that get picked up usually via DHCP. Thus
> > /etc/resolv.conf said "nameserver" and it was rather
> > non-obvious to show the real DNS servers. (This was one of the
> > complaints/triggers that led to creating this blueprint). But
> > dnsmasq does proper rotation and fallback between multiple
> > nameservers, i. e. if one does not respond it uses the next one
> > without long timeouts.

One more thing on that point which was just brought up in:

In the past, with dnsmasq on desktop we could ship a .d file which would
instruct the system dnsmasq to forward all ".lxc" or ".lxd" queries to
the LXC or LXD dnsmasq instance.

We were planning on doing so by default this cycle, so it'd be good to
confirm that resolved doesn't regress things in this regard.

> >
> > * On servers, cloud images etc. we did not have any local DNS server.
> > Configured DNS servers (via DHCP or static configuration in
> > /etc/network/interfaces) were put into /etc/resolv.conf, and
> > every program (via glibc's builtin resolver) directly contacted
> > those.
> >
> > This had the major drawback that if the first DNS server does not
> > respond (or is slow), then *every* DNS lookup suffers from a ~ 10s
> > timeout, which makes every network operation awfully slow.
> > Addressing this was the main motivation for the blueprint. On top
> > of that, there was no local caching, thus requesting the same name
> > again would do another lookup.
> >
> > As of today, we now have one local resolver service for all Ubuntu
> > products; we picked "resolved" as that is small and lightweight,
> > already present (part of the systemd package), does not require D-Bus
> > (unlike dnsmasq), supports DNSSEC, provides transparent fallback to
> > contacting the real DNS servers directly (in case anything goes wrong
> > with the local resolver), and avoids the first issue above that
> > /etc/resolv.conf always shows
> >
> > Now DNS resolution goes via a new "libnss-resolve" NSS module which
> > talks to resolved [2]. /etc/resolv.conf has the "real" nameservers,
> > broken name servers are handled efficiently, and we have local DNS
> > caching. NetworkManager now stops launching a dnsmasq instance.
> >
> > I've had this running on my laptop for about three weeks now without
> > noticing problems, but there may well be some corner cases where this
> > causes problems. If you encounter a regression that causes DNS names
> > to not get resolved correctly, please do "ubuntu-bug libnss-resolve"
> > with the details.
> >
> > Thanks,
> >
> > Martin
> So in the past there were two main problems with using resolved, I'd
> like to confirm both of them have now been taken care of:
> 1) Does resolved now support split DNS support?
> That is, can Network Manager instruct it that only *.example.com
> should be sent to the DNS servers provided by a given VPN?
> That's a very important feature that the current dnsmasq integration
> gives us which amongst other things avoids leaking DNS queries to your
> employer when you're not routing all your traffic to the VPN and also
> greatly reduces the overall network latency when using a VPN with a far
> away endpoint.
> It's also a critical feature for anyone who wants to run multiple
> VPNs in parallel, which NetworkManager 1.2 now supports.
> 2) Does resolved now maintain a per-uid cache or has caching been
> disabled entirely?
> In the past, resolved would use a single shared cache for the whole
> system, which would allow for local cache poisoning by unprivileged
> users on the system. That's the reason why the dnsmasq instance we spawn
> with Network Manager doesn't have caching enabled and that becomes even
> more critical when we're talking about doing the same change on servers.
> If not done already, I'd very strongly suggest a full audit of
> resolved by the security team with a focus on its caching mechanism.
> Additionally, what's the easiest way to undo this change on a server?
> I have a few deployments where I run upwards of 4000 containers on a
> single system. Such systems have a main DNS resolver on the host and all
> containers talking to it. I'm not too fond of adding an extra 4000
> processes to such systems.
> --
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com

> --
> ubuntu-devel mailing list
> [email protected]
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Stéphane Graber
Ubuntu developer