Tuesday, 13 May 2014

Re: errors.ubuntu.com and upgrade crashes

Hash: SHA1

Bjoern Michaelsen wrote on 12/05/14 12:44:
> ...
> while chasing around bug 1219245 as it showed up so high in
> errors.ubuntu.com I found that:
> - that bug exists since at least LibreOffice 4.0.2 (released with
> Ubuntu 13.04)
> - all of todays reports come from 14.04 exclusively
> - the bug wasnt ranked high in errors before 14.04 release
> I vaguely remember seeing that bug peaking up on errors right after
> 13.10 was released, just to quickly vanish into irrelevance again
> (I might be wrong there, as I cant go back in time on errors.u.c).

You can. Set "Most common of these errors from" to "the date range",
then enter the dates, for example 2013-10-16 to 2013-10-20. The result
is not as you remember: over that period, bug 1219245 was not in the
top 50 at all, whereas it was #42 for the equivalent period around the
14.04 release.

(Unfortunately I can't link you directly to those results, because of
a bug in the error tracker. <http://launchpad.net/bugs/1201396> So you
need to enter the dates yourself.)

> While there is no good reproduction scenario in the bug reports,
> there is one report claiming it crahed "while installing a font"
> and another it crashed "during an upgrade". This leaves me with the
> suspicion, that the crash is actually people leaving LibreOffice
> running during a release upgrade (which is brave and a nice vote of
> confidence, but not really a supported scenario).

"Supported" is a weasel word. I've never understood why Ubuntu lets
people have apps running during an upgrade, because that has many
weird effects. But Ubuntu *does* let people do that. And as long as it
does, Ubuntu developers are responsible for the resulting errors.

> By extension, this leaves me with the suspicion that the nice
> exponential drop-off of crash reports for a distro over time is not
> actually us fixing stuff via SRUs (which is way to slow for that),
> but that it is just fewer people doing upgrades vs. people running
> it in production and experiencing some transitional crashes (that
> do not happen after they use the system 'in production'
> afterwards).

If true, that would be a terrible indictment of our upgrade process!
Fortunately it is not. The spike and drop-off in apparent error rate,
over the 90 days after an Ubuntu release, is a flaw in the error
tracker's calculation.

The true error rate per machine, for any period, is the number of
errors reported overall, divided by the number of machines running
Ubuntu that would report errors if they had any.

The problem is that we don't know the latter. We don't get pings from
machines saying "I'm running Ubuntu, and I would report errors if I
had any, but I didn't have any today". We know a machine exists today
only if it *did* report any errors today.

So we use an approximation for the total number of machines that would
report errors today if they had any: the number of machines that have
reported any errors in the past 90 days. That incorrectly excludes
machines that would have reported errors but fortuitously had zero in
the past 90 days. And it incorrectly includes machines that reported
an error 89 days ago and were then thrown into the garbage. Hopefully
those biases roughly cancel each other out.

Unfortunately, this calculation goes to hell on release day. All of a
sudden there are a gazillion new machines with the new version of
Ubuntu on them. And of those, some fraction will report their first
error. But that fraction are the only ones we know exist at all. So the
denominator is much too low -- making the calculated error rate much
too high.

This is why the calculated error rate for every new release spikes on
release day, and corrects itself over the next 90 days. It's also why
the calculated error rate for 13.10 plummeted at the 14.04 release:
lots of 13.10 machines were upgraded to 14.04, and so from the error
tracker's point of view they're still 13.10 machines that suddenly
became error-free.

If anyone would like to fix this, it's just a simple matter of
programming. ;-) <http://launchpad.net/bugs/1069827>

> Does errors.ubuntu.com have a way to identify crashers during a
> release upgrade (or maybe even: first-starts after a release
> update)? Would it be possible to filter crash reports for that
> scenario, e.g. trivially: mark crashers 48hours after the upgrade
> as 'potentially an upgrade sideeffect' or somesuch?
> ...

Probably not retroactively. But I imagine it would be fairly easy to
add info to future error reports asking if do-release-upgrade (or
whatever) was running at the time.

> While those upgrade issues should be a concern too, as-is it seems
> to me they are overblown in their importance and we dont have a
> good way to look if they happen in regular production use after
> upgrade.
> ...

With respect, I don't see that you have any justification in deciding
that this particular issue is "overblown". A crash in LibreOffice is
just as bad whether it happens during an upgrade, during a full moon,
or during the Olympic Games. If you think it's unfair somehow that
apps are expected to keep running during upgrades, then fix the
upgrade process so that apps can't run during the upgrade. Don't just
filter out those crashes as if they aren't happening.

- --
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


ubuntu-devel mailing list
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel