Tuesday, 27 October 2015

Current Launchpad builder layout and plans

Matthias reminded me that it would be worth giving people a consolidated
update on the current state of Launchpad builders, and changes we intend
to make in the near future.

== History ==

Dedicated watchers of https://launchpad.net/builders may have noticed
quite a few changes recently. The overall trend is that we're working
on moving all builders into OpenStack clouds, a system we call
ScalingStack [1]. This is giving us much better density of builders - a
single unit of hardware can support many builder nodes - allowing us to
vastly increase our build capacity compared to a year or two ago while
saving on rack space at the same time. Here's a rough timeline of
production changes so far in this project:

2014-08: ScalingStack enabled for amd64/i386 PPAs on production
2015-03 - 2015-07: Fixing various blockers for building Ubuntu in
ScalingStack (new-style ddebs, modern sbuild, teaching Launchpad
about virtualised-only architectures, several full test rebuilds)
2015-08: Ubuntu amd64/i386 builds switched to ScalingStack
2015-10: Ubuntu ppc64el builds switched to ScalingStack and PPAs
enabled [2]; arm64/armhf undergoing testing

[1] https://insights.ubuntu.com/2014/10/30/scalingstack-2x-performance-in-launchpads-build-farm-with-openstack/
[2] http://blog.launchpad.net/ppa/ppas-for-ppc64el

== Design ==

The guest instances are reset to a clean state between each build. In
fact, in order to minimise latency in the common case where the build
farm is more than 0% idle, they're actually reset at the end of the
previous build. This means that the guest configuration has to be
generic: a given reset doesn't know whether the next build is going to
be amd64 or i386 (the same guest images support both), or what kind of
build it's going to do. This is an intentional trade-off, but it does
mean that we can't do things like giving certain builds more RAM or
disk: we need to find a reasonable point that gives us a good density of
builders across all of our compute nodes while also being able to build
most packages.

At the moment our guests have four virtual CPUs, 4GiB RAM, 4GiB swap,
and 60GiB disk. This can be tuned but at the cost of being able to
support fewer concurrent instances, and the same cloud regions are used
for autopkgtest workers and error retracers as well. In cases where a
build exceeds these limits, do consider whether it's possible to squash
it down a bit with reasonable effort: for example, splitting up
translation units or performing less aggressive optimisation are valid
approaches, and may even be acceptable upstream.

== Common problems ==

* Build fails without a log

This means that the build failed catastrophically enough that
Launchpad was unable to retrieve the build log from launchpad-buildd
at the end of the build. There are various possible causes. Running
out of RAM or disk can have this effect, as can crashing the builder
instance by way of a kernel bug, or a few other cases where the
builder fails very early in the build. If you run into such a case
and it's reproducible (i.e. a simple retry doesn't clear it up), feel
free to ask the Launchpad team for advice.

* Builder stuck in Cleaning on the /builders page

This means that the process of resetting the builder to a clean state
failed. In most cases this will disable the builder with useful
notes instead, but there are some cases where this doesn't happen.
We keep an eye on this to ensure that we don't end up with too few
builders available; you don't need to tell us about them unless build
queues are backing up.

* Builder disabled on the /builders page

This is occasionally done by hand, but is usually automatic as a
result of a failed reset. In either case there should be useful
notes visible on the page describing the individual builder in
question. Again, we keep an eye on this to ensure that we don't end
up with too few builders available; you don't need to tell us about
them unless build queues are backing up.

* lcy01

The lcy01-* builders frequently fail to reset at the moment, usually
with copious error output from the host kernel. Our sysadmins are
working on tracking down the root cause.

== Future plans ==

The next major change will be to switch arm64 over to ScalingStack; this
is being tested at the moment. Once that's done, all architectures will
have at least nine builders, which will make it rare to ever find
yourself waiting for builds.

The remaining architectures are armhf (currently 19 builders on one
physical chassis) and powerpc (currently nine builders on three physical
machines). The plan for armhf is to share guests with arm64, which
requires a kernel patch so that we can set the personality such that
uname returns "armv7l" rather than "armv8l" as linux32 currently gives
us. On powerpc, we can't share guests with ppc64el because of the
different endianness, but once we have baseline cloud images (coming
soon) we'll be able to bring up another set of guests alongside ppc64el
on the same set of compute nodes.

Colin Watson [cjwatson@ubuntu.com]

ubuntu-devel mailing list
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel