Monday, 29 July 2013

Release engineering sprint, July 2013

Canonical's Ubuntu release engineering team, plus a couple of hangers-on
like myself, held a sprint last week in London. It's been a long time
since many of us have been in the same place, and it was tremendously
useful. The essence of this kind of infrastructure work is normally
that the less you notice it the better a job we're doing; but we touched
on quite a few interesting topics, so here are some notes of what we

== Attendees ==

* Adam Conrad
* Colin Watson
* Steve Kowalik
* Stéphane Graber
* Tim Chavez
* Ursula Junque
* William Grant

== Image build pipeline ==

We reviewed the pipeline from developer upload to built images with an
eye to finding and fixing inefficiencies, particularly in the Ubuntu
Touch images (which have a final system-image phase). Our assessment at
the start of the sprint was that the base overhead was a little under
two hours, but with several sources of occasional extra latency.

A noticeable amount of time here will be improved by pending hardware
upgrades. Adam spent some sprint time working on the installation of
new Calxeda systems; we don't know how much those will shave off the
livefs build phase (currently 52m or so) but it wouldn't be a surprise
if they removed 20m or so, and the current Panda boards occasionally
corrupt data which causes extra delays while people debug them. We've
also requested a dedicated system for offloaded archive administration
jobs such as proposed-migration, which should make several processes
more predictable.

On the upload and publication side, we made upload processing run every
minute rather than every five minutes; Adam worked on source package
caching in apt-ftparchive, which I think he's now handed off to Marc
Deslauriers; and Ursula moved translations processing in the archive
publisher out to an asynchronous job, eliminating a source of
publication latency which has been known to cause occasional multi-hour
delays in the past.

In the system-image phase, Stéphane is working on converting the
compression step to pxz, which will save about 15m.

I made a first stab at documenting the proposed-migration workflow

We identified some other potential savings which we haven't yet had a
chance to work on:

* Push-trigger proposed-migration, with a 15-minute fail-safe
* Review notifications of proposed-migration/autopkgtest failures
* Selective base system caching in live builds (could save about 5m)

== Live filesystem builds in Launchpad ==

This is really part of the image build pipeline too, but it's been on
our backlog for a long time and is an interesting project in its own
right. The general plan is that, instead of having a separate set of
machines dedicated to building live filesystems - typically only one per
architecture, and if more then the scheduling is manual and cumbersome -
we should have live filesystems be a new type of build job in Launchpad,
thus simultaneously giving us much more flexibility for building live
filesystems (especially around release time when we want to do lots of
work in parallel) and giving us more package build resources during the
majority of the time when no live filesystems are being built.

So far so good, but we took advantage of having almost everyone who
knows anything about our build daemon infrastructure in one room to nail
down a lot of the details and get moving on the implementation.

We identified build cancellation on non-virtualised builders as a
prerequisite (so that we don't end up in a situation where we can't do
live filesystem builds because all the builders are occupied in parallel
by long-running package builds). William, Adam, Steve, and I spent some
time sorting out the detailed design for that, and I got nearly all the
code written on both the slave and master sides; this should be ready to
land in the next week or two.

Meanwhile, Adam wrote a good part of the slave side of live filesystem
builds, and William wrote most of the master side. I don't have an ETA
yet, but I hope that won't be too much longer either.

== Maintenance work ==

Adam handled the release engineering for 13.04 alpha 2, worked on
preparing 12.04.3, and fixed a few miscellaneous launchpad-buildd bugs.

Ursula worked on generating an inter-image changelog in cdimage, similar
to that currently available in ubuntu-touch-preview builds. This
involved some work by Stéphane and I (still in progress) on the layout
of so that the cdimage code can fetch changelogs
reasonably efficiently. Ursula also started work on figuring out the
bugs that cause us to occasionally lose binary publications when
multiple override operations happen in a single publication window.

Steve fixed an OOPS in DistroSeries:+queue
(, worked on infrastructure for
keeping an audit trail of various Launchpad operations, and worked on
making package diff generation more responsive

I fixed, which has been plaguing
our build farm with hung builds for a few months.

William discussed with IS the SAN upgrade plan to resolve ongoing
librarian space issues. (Among other things, this is blocking improved
handling of ddebs.)

Andy Whitcroft visited us one day and worked with Adam on some
refactoring of kernel packaging.

== Other discussions ==

We talked with Tim about divergence in the PES build apparatus, and made
some preliminary plans towards consolidation.

Adam and William went through our local sbuild changes and confirmed
that the plan to upgrade away from a fork of a nine-year-old version of
sbuild is still valid (among other things, this blocks some improvements
to backports).

William and I thrashed out the remaining points of dispute on how to
handle the development series alias with respect to PPAs

Colin Watson []

ubuntu-devel mailing list
Modify settings or unsubscribe at: