Friday 14 May 2021

Re: Packaging policy discussion: After=network-online.target

On Fri, May 14, 2021 at 10:14:27AM -0400, Thomas Ward wrote:
> Expanding on this Seth...

> On 5/13/21 6:51 PM, Seth Arnold wrote:
> > In the last week I've seen four different conversations about how to
> > properly start a service only 'after the network is up', and the different
> > people had different ideas of what this meant for their service:

> > - one wanted LAN up and working, nothing fancy
> > - one wanted to wait until DNS resolution was working
> > - one wanted to wait until an ospf daemon had negotiated routing
> > tables and installed a default route
> > - one waited to wait until ntp had synced (not just started, but
> > actually synced)

> I think this is the 'can of worms' I just mentioned in #ubuntu-devel on
> IRC.  Each and every one of these specific cases would need its own network
> target or SystemD target for all those cases.  We also have the case (from a
> 2017 bug that Server Team "Won't Fix"'d) that someone wants NGINX to start
> only after DNS works and the network is 'up' (and routable).  There's no
> special targets for those 'special cases' at a SystemD level.

If you look at the definition I've proposed for network-online.target,
you'll see that this target specifically encompasses "DNS works and the
network is routable". And I believe we have this working today with both
networkd and NetworkManager; to my recollection, the cases where the
implementation doesn't match the proposed definition today all relate to
cases where network-online.target waits for too *much* rather than too
*little* and causes boot delays.

> Case #2 would require the application to have some kind of start-script that
> can check DNS and not fail on DNS resolution failure.  (Or, exit in a way
> that SystemD would retry it again after a delay - exit code != 0 and not a
> sigkill, etc., with a restart delay of, say, 15-30 seconds while depending
> on network.target or network-online.target.)  NGINX fits this case, because
> if you use a DNS name in there and it doesn't resolve, it causes a bit of an
> error at startup.

$ grep Before /lib/systemd/system/systemd-resolved.service
Before=network.target nss-lookup.target shutdown.target
$

DNS resolution is always up before network.target, let alone
network-online.target.

> Case #3 requires more than LAN up, and like case #2 would need its own
> configuration / script to check that ospf is populated and such - there's
> nothing in SystemD that governs this.

Case #3, if OSPF is the only source of a default route for the system, is
covered by the proposed definition of network-online.target. (If the system
gets a default route by some other mechanism and it is subsequently replaced
by OSPF, then network-online.target is insufficient; but I think that
qualifies as a host misconfiguration.)

> Case #4 is like Case #3 and #2, except that you have to have a tie in to
> NTP.  Which, in more modern deployments, is `systemd-timesyncd` which
> handles NTP sync.  (Or Chrony, if you're like me and run an NTP server -
> chrony provides the granularity I need for the NTP server side of things).

NTP is an entirely separate thing from networking and should be handled some
other way.

> Case #2, #3, and #4 fit the standard of "This is a non-standard
> configuration that you as the sysadmin have implemented.  If you need
> special case handling for these services, that's beyond the scope of the
> 'default' packaging/service configuration goals."

For case #4, yes; and this is orthogonal to discussions of the
network-online.target.

For cases #2 and #3, I disagree per the above.

> I do agree we need a standard defined for "What does 'online' mean, and
> everyone has to accept that as the definition for what SystemD's
> 'network-online' state is.  But for now, until that standard is defined
> somewhere upstream, we have to accept that there is no standard

It doesn't have to be adopted upstream for us to be making Ubuntu better for
our users. (The goal is to get this agreed with the systemd upstream
community; but that should not be a blocker.)

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer https://www.debian.org/
slangasek@ubuntu.com vorlon@debian.org