Thursday, 15 February 2018

Re: autopkgtest-build-lxd failing with bionic

From c1924280973123c618fc07762b063abaf64d9d26 Mon Sep 17 00:00:00 2001
From: Iain Lane <[email protected]>
Date: Thu, 15 Feb 2018 16:21:59 +0000
Subject: [PATCH] lxd: If we're running systemd, wait until the network is up

We execute `apt-get update' more or less as soon as the container is
started. In some situations this is too early: it can be before network
is fully working.

If we have systemd, use network-online.target to wait until it thinks
networking is up.
---
tools/autopkgtest-build-lxd | 19 ++++++++++++++++++-
virt/autopkgtest-virt-lxd | 2 ++
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/autopkgtest-build-lxd b/tools/autopkgtest-build-lxd
index 623d5eb..9350a81 100755
--- a/tools/autopkgtest-build-lxd
+++ b/tools/autopkgtest-build-lxd
@@ -68,7 +68,7 @@ setup() {
lxc exec "$CONTAINER" -- chmod 644 /etc/apt/apt.conf.d/01proxy
fi

- # wait until it is booted: lxc exec works and we get a numeric runlevel
+ # wait until it is booted: lxc exec works, we get a numeric runlevel and networking is up
timeout=60
while [ $timeout -ge 0 ]; do
timeout=$((timeout - 1))
@@ -81,6 +81,23 @@ setup() {
exit 1
}

+ # only if we're running systemd
+ if lxc exec "$CONTAINER" -- test -d /run/systemd/system; then
+ lxc exec "$CONTAINER" -- systemctl start network-online.target
+ timeout=60
+ while [ $timeout -ge 0 ]; do
+ timeout=$((timeout - 1))
+ if lxc exec "$CONTAINER" -- systemctl is-active network-online.target; then
+ break
+ fi
+ sleep 1
+ done
+ [ $timeout -ge 0 ] || {
+ echo "Timed out waiting for network to come up" >&2
+ exit 1
+ }
+ fi
+
ARCH=$(lxc exec "$CONTAINER" -- dpkg --print-architecture </dev/null)
DISTRO=$(lxc exec "$CONTAINER" -- sh -ec 'lsb_release -si 2>/dev/null || . /etc/os-release; echo "${NAME% *}"' </dev/null)
CRELEASE=$(lxc exec "$CONTAINER" -- sh -ec 'lsb_release -sc 2>/dev/null || awk "/^deb/ {sub(/\\[.*\\]/, \"\", \$0); print \$3; quit}" /etc/apt/sources.list' </dev/null)
diff --git a/virt/autopkgtest-virt-lxd b/virt/autopkgtest-virt-lxd
index f8e9a4d..acb525e 100755
--- a/virt/autopkgtest-virt-lxd
+++ b/virt/autopkgtest-virt-lxd
@@ -100,6 +100,8 @@ def wait_booted():
continue
out = out.strip()
if out.split()[-1].isdigit():
+ adtlog.debug('waiting for network')
+ VirtSubproc.check_exec(['lxc', 'exec', container_name, '--', 'sh', '-ec', 'if [ -d /run/systemd/system ]; then systemctl start network-online.target; while true; do if systemctl -q is-active network-online.target; then break; fi; sleep 1; done; fi'], timeout=35)
return

adtlog.debug('wait_booted: runlevel "%s", retrying...' % out)
--
2.14.1

[ autopkgtest-devel, this is
https://lists.ubuntu.com/archives/ubuntu-devel/2018-February/040138.html
and thread FYI - Reply-To / Mail-Followup-To set to exclude
ubuntu-devel from this subthread so reviews go to the right place ]

On Thu, Feb 15, 2018 at 10:28:05AM -0500, Stéphane Graber wrote:
> […]
> And confirmed that networking inside both of them works fine here.
>
> I wonder if it's a netplan vs ifupdown thing hitting autopkgtest in this case?

I can build images: images(!) quite fine here, but when actually using
them I see these temporary resolution failures most of the time during
the initial apt-get update.

I tracked this down to a race condition - basically we try to do the
`apt-get update' before networking is fully up. (OK, I just saw Julian's
post which came in while I was writing this and says the same thing...)

There's a patch attached here which fixes the problem for me. I'm not
sure if there's a better way to do this - basically it starts
network-online.target and waits for it to become active, with a timeout.
Review appreciated.

Cheers,

--
Iain Lane [ [email protected] ]
Debian Developer [ [email protected] ]
Ubuntu Developer [ [email protected] ]