Thursday, 5 December 2013

Process bug: Silent -proposed migration failures

Version: GnuPG v1.4.15 (GNU/Linux)

I think we have a rather serious bug in our process. I'm not sure what can be
done to address it, but it's worth discussion.

When I upload a package and its build fails for some reason, I get an email
notification which contains a link right to the build log. When I click on
that, I can immediately see why the build failed. If I think it was an
intermittent failure, I can retry the build. If it's a legitimate problem
with the package, I'll work on reproducing it locally[*], then upload a new
rev of the package fixing the problem.

Let's say however that the package builds fine. It then goes to proposed.
The process bug occurs when the migration from proposed fails. There are
several aspects to this bug.

The most serious is that no email notification is sent. You have to actively
watch for the positive promotion of your packages and if you don't see it in
what might seem a reasonable (but undetermined) amount of time, you then have
to check a web page[+]. This does not scale if you are uploading a large
number of packages.

The second problem is that even when you do scan the web page (probably
searching for your name, since it may be difficult to figure out all the
packages you uploaded that haven't yet migrated), you are presented with two
links to Jenkins output, one public and one private. Most people will have to
click on the public link, but that takes you to a page that is at best
mysterious in hiding the cause of the migration failure[#]. You'd think that
clicking on "Latest Test Result" would show you the problem, but it doesn't.
It takes you to another mysterious page. Several windy twists, turns, and
dead ends later, you might end up on the Console Output page, which is where
the real actionable problem is usually evident (even if, like a typical build
log, it's buried somewhere at the bottom of the page).

Finally, if you think the problem is transient, there's no way afaict to
easily "retry" the migration, e.g. if it's an autopkgtest that may
subsequently succeed due to other uploads. This one can probably be forgiven,
since I would guess that most migration failures can only be corrected by an
updated package.

So I think the most critical bug in the process is the lack of notification
for proposed migration failures, followed by a much quicker and easier path to
log output that can actually help you decipher the problem.


[*] which can sometimes be rather tricky as a recent buildd-only failure in
system-image proved. ;)