Thursday, 22 February 2018

Re: autopkgtest-build-lxd failing with bionic

Hello all,

(sorry for the delays, I don't get to tend to my Ubuntu mailboxes as often any

Steve Langasek [2018-02-20 16:43 -0800]:
> Then to be blunt, the definition of the target should be fixed in those
> implementations so that it's not useless.

Right, fair point.

> I understand and agree with the argument that modern services should be
> robust in the face of intermittent networks. But I don't agree that
> network-online is "legacy" only for sysvinit compatibility, or that its
> definition is too mushy to be useful.

It's making stronger assumptions about the network management implementation
behind it, and about correct configuration. But I suppose I was being overly
general there - in the context of autopkgtest LXC/LXD containers at least it's
unlikely that we need to deal with VPNs, radius, or anything like that.

> For oneshot-style operations (such as... things you want to do on a one-time
> basis on first boot of an autopkgtest runner VM, without having to write a
> daemon around them that listens to netlink), is
> precisely the right semantic.
> autopkgtest is *not* the only thing that cares about this. The problem
> should be solved once, well, in the systemd network stack, not pushed onto
> the consumers to repeatedly reimplement poorly.

Agree about that.

> > And there's still the "apt retries several times" fallback (which is why I
> > do see the initial apt failure, but the retry works).
> But we have all the tools at our disposal to run apt at the /right/ time,
> without polling or retrying, for maximum efficiency :)

I wasn't proposing that as an actual solution, just as an explanation why I
haven't seen the bug yet. I meant that this retry papered over the bug (at
least on my system - apparenlty not on Timo's).

I'm afraid we need to leave that retry in, though - it was written to
counteract transient failures due to hash sum mismatches or other weird
oddities, they've bitten us too often in the past (every instance of retry was
written in reaction to several of those).

> > - it's supposed to be a SysV backwards compat shim for LSB's "network"
> > dependency, and not well-defined
> From my POV, the sane definition is:
> - DNS setup is complete
> - all "required" network interfaces (implementation-defined) have completed
> their configuration
> - if no network interfaces are defined to be "required", then at least one
> interface is up
> This is broad enough to encompass everything from VPNs to captive portals to
> proxy-only networks, and provides a clear separation of responsibilities.

Since you are much more on top of the current netplan/networkd implementation
in Ubuntu containers: does that currently match this definition?

> > - These tools should also work with Debian containers, which in theory
> > could also run sysvinit. This is also the reason why they still use
> > `runlevel` instead of `systemctl is-system-running` or something
> > similar.
> Sure, but in principle, once you've reached runlevel 2 under sysvinit you
> can rely on the network being up because that's part of the definition of
> the runlevel. So the systemd code doesn't need to have a sysvinit
> equivalent.

OK, so I suppose we could replace the check with

if running_systemd
wait for
wait for runlevel 2

which would still support non-systemd containers (like Ubuntu 14.04 or custom
configs in Debian).