Friday 16 February 2018

Re: autopkgtest-build-lxd failing with bionic

From a194535b9bb2d5fbfc8cefa3b047c57fa40736a1 Mon Sep 17 00:00:00 2001
From: Iain Lane <iain.lane@canonical.com>
Date: Thu, 15 Feb 2018 16:21:59 +0000
Subject: [PATCH] lxd: Wait until we have a default route.

We execute `apt-get update' more or less as soon as the container is
started. In some situations this is too early: it can be before network
is fully working.

When building lxd containers or using autopkgtest-virt-lxd, wait until
we have a default route before proceeding.
---
tools/autopkgtest-build-lxd | 14 +++++++++++++-
virt/autopkgtest-virt-lxd | 10 ++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/autopkgtest-build-lxd b/tools/autopkgtest-build-lxd
index 623d5eb..b3141d6 100755
--- a/tools/autopkgtest-build-lxd
+++ b/tools/autopkgtest-build-lxd
@@ -68,7 +68,7 @@ setup() {
lxc exec "$CONTAINER" -- chmod 644 /etc/apt/apt.conf.d/01proxy
fi

- # wait until it is booted: lxc exec works and we get a numeric runlevel
+ # wait until it is booted: lxc exec works, we get a numeric runlevel and networking is up
timeout=60
while [ $timeout -ge 0 ]; do
timeout=$((timeout - 1))
@@ -81,6 +81,18 @@ setup() {
exit 1
}

+ while [ $timeout -ge 0 ]; do
+ timeout=$((timeout - 1))
+ if lxc exec "$CONTAINER" -- sh -ec 'test -n "$(ip route show to 0/0)"'; then
+ break
+ fi
+ sleep 1
+ done
+ [ $timeout -ge 0 ] || {
+ echo "Timed out waiting for network to come up" >&2
+ exit 1
+ }
+
ARCH=$(lxc exec "$CONTAINER" -- dpkg --print-architecture </dev/null)
DISTRO=$(lxc exec "$CONTAINER" -- sh -ec 'lsb_release -si 2>/dev/null || . /etc/os-release; echo "${NAME% *}"' </dev/null)
CRELEASE=$(lxc exec "$CONTAINER" -- sh -ec 'lsb_release -sc 2>/dev/null || awk "/^deb/ {sub(/\\[.*\\]/, \"\", \$0); print \$3; quit}" /etc/apt/sources.list' </dev/null)
diff --git a/virt/autopkgtest-virt-lxd b/virt/autopkgtest-virt-lxd
index f8e9a4d..6e7ec65 100755
--- a/virt/autopkgtest-virt-lxd
+++ b/virt/autopkgtest-virt-lxd
@@ -100,6 +100,16 @@ def wait_booted():
continue
out = out.strip()
if out.split()[-1].isdigit():
+ adtlog.debug('waiting for network')
+ VirtSubproc.check_exec(['lxc', 'exec', container_name, '--',
+ 'sh', '-ec',
+ '''while true; do
+ if test -n "$(ip route show to 0/0)"; then
+ break;
+ fi;
+ sleep 1;
+ done'''],
+ timeout=30)
return

adtlog.debug('wait_booted: runlevel "%s", retrying...' % out)
--
2.14.1

On Thu, Feb 15, 2018 at 09:55:47PM +0100, Martin Pitt wrote:
> Hello Iain, all,
>
> Iain Lane [2018-02-15 18:48 +0000]:
> > There's a patch attached here which fixes the problem for me. I'm not
> > sure if there's a better way to do this - basically it starts
> > network-online.target and waits for it to become active, with a timeout.
> > Review appreciated.
>
> I wouldn't pick on any of these: network-online.target is a sloppily defined
> shim for SysV init backwards compatibility, and may not ever get started (in
> fact, that's the goal ☺); and the container might not use networkd, so I
> wouldn't use s-n-wait-online either. I think querying

Interesting. I thought that it was the systemd way to say 'I am online
now' --- i.e. nm-online or systemd-networkd-wait-online, which is the
question I wanted to get a positive answer to. I can see that the SysV
implementation isn't great, but it's not clear to me that it was ill
defined for this case.

> [ -n "$(ip route show to 0/0)" ]

This is better though, and works too. Please take a look at the attached
patch. Thanks! :-)

Cheers,

--
Iain Lane [ iain@orangesquash.org.uk ]
Debian Developer [ laney@debian.org ]
Ubuntu Developer [ laney@ubuntu.com ]