Friday 14 August 2020

+1 Maintenance report

Hi everyone,
this is my +1 duty report of this week. I was glad to see several others
work on migrations and overall archive health as well - even with a massive
update_excuses it can feel good if you see progress.

Here is a little summary of what I worked on, but I beg your pardon as I'm unsure if
I documented everything, especially some cases that were rather trivial
touch-and-go might be missing.


1. chromium-browser not upgraded to snap in Focal
   - This is less a +1 task than a generic archive dependency issue that
     I quickly looked at after being asked. But I found it is known and under
     control at bug LP: #1889106
   - This later became a ubuntu-devel discussion about the use of epoch's in
     package versions for deb-to-snap transitions.


2. groovy rsync was one of the many packages waiting for
   armhf test backlog to resolve - it just needed some help with test triggers
   and then migrated.


3. Haskell was blocked last time I was on +1, I revisited the current state.
   It seems the transition I linked last time [1] is still going on in Debian.
   A few current things one could wonder about being real problems:
   - dependencies libghc-haxr-dev-3000.11.4.1-$hash are odd as the same
     source packages in Debian/Ubuntu 3000.11.4.1-1 built different
     $provides and hence there might be further mismatches.
   - lack of build dependencies, there are many cases waiting for the
     same thing when tracing down the tree - maybe with a bit of work we can
     at least reduce the number blocked in excuses.
     haskell-gtk-strut [2] (and others) -B-depend-> haskell-gi-gdk [3]
     haskell-gi-gdk [3] (and others) ->B-depend-> haskell-gi-pango [4]
     haskell-gi-pango [4](and others) -B-depend-> haskell-haskell-gi-base
     - The latter 'haskell-haskell-gi-base' is not auto-synced v23
       from Debian, but neither did I find a removal
     - unfortunately [5] tells us that it would (even in Debian) not overall be
       ready yet anyway.
     - But since it could unblock so many things in update-excuses I gave
       it a try and it built fine [6] - it worked.
     - Furthermore I checked but there was no removal filed [6] that would
       explain the current lack of the package (someone might have wanted to
       hold all dependent things back)
   - we still have to wait for this to be fully ready as discussed in [1]
     but we could make some steps forward by syncing in haskell-haskell-gi-base.
     Yet I feel it might be removed/blocked-syncing for a reason - hence I
     wanted to ask upfront.
   - After talking to the AAs I learned that this information is another gem
     that can be found in the package publishing history. Yes there was no bug
     nor a sync blocker, but [17] shows "Temporary removal to downgrade to a
     version compatible with available reverse-dependencies"
     - This might (as I have assumed) interact with the ICU transition so we
       should not touch it right now.
     - But as soon as ICU is done one should syncpackage 0.23* of
       haskell-haskell-gi-base to get the haskell transition rolling again.
   - While I'm unsure if anyone actually reads it, I updated the +1 wikie [18]
     about that.
   - The other day Vorlon pinged me (I have updated the wiki page):
      <vorlon> cpaelzer: fyi after yesterday's conversation (and britney
      nagging me about a number of packages that I "uploaded" being stuck in
      -proposed too long ;), I am going ahead and rolling forward the various
      haskell packages that had been rolled back, after confirming they're not
      entangled with any current transitions.  (e.g. the haskell-gi-* stuff
      can't be since it was all removed from release pocket)
   - This unblocked some, but not the whole haskell* yet and as one would
     expect there also are test issues on needs to look after.
     Seems like this will still be with us a while.


4. missing dependencies all over the place
   Checking from the bottom I found multiple packages with unsatisfiable
   dependencies for ~100 days. Worth trying to clean up.
   - icingaweb2-module-x509 depends on icingaweb2-module-reactbundle
     - state is the same in Debian [7]
     - we should remove it before 20.10 closes since it is broken as-is
     - removal filed [8] (TBH I'm not 100% sure it needs to be removed in
       this state, but personally I'd prefer it clean and gone - the AAs
       will tell me)
   - gspiceui depends on geda-gschem & geda-gnetlist which are unavailable
     - missing packages are from src:geda-gaf which we should have in universe
     - this source it blocks on is missing in testing as well as >=Focal
     - turns out this was related to the guile-2.0 removal and the new package
       just came in as auto-sync - but has no chance to work as the bug [9]
       to get geda-gaf updated stalled, found no package owner and got
       removed [10].
     - the scheduled removal was filed against gspiceui [11] and if there is
       a fix it will auto-sync then, but for now we should remove it.
     - Removal filed at [12]
   - The list above would go on, but I started to wonder if it is worth the
     effort
     - I started to wonder if those are worth all the effort
        Pro: removal cleans up excuses
        Con: won't end up in -release and causes work
       - I started a discussion about it and I learned that while RM bugs would
         work for those they are plenty of overhead. It would be better for
         those cases to just ping an Archive Admin and ask him to remove it. To
         have a chance to later bring back things when the dependencies are
         available there is [15].
       - A requirement for that also is that the package maintainer is aware,
         e.g. have a Debian bug filed about it to increase the chance it can
         come back later.
       - The AAs realized this isn't documented well and want to do that, expect
         a mail to ubuntu-devel at some time.
         Thanks Steve, Matthias and Sebastian for the discussion.
   - After learning the above I provided further cases of this category
     directly as an MP asking for MP-merge and removal of the packages in one
     step [16].
   - After learning the above I analyzed way more cases and created [23] out
     of that for the Archive admins to remove packages but also to track when
     they can come back and/or prevent further syncs.


5. missing builds all over the place
   - Further there are many packages that show "has no binaries on any arch"
     - not 100% the same as the dependency issue, but similar enough (would
       need removals).
     - I found that we have quite a lot of these cases (61). But all of those
       that I checked are only in -proposed and removed in debian.
     - As of today, ignoring those that belong to another known bucket like
       rust, haskell or desktop I've seen 25 of those packages:
       lomiri-download-manager, qtfeedback-opensource-src, qtpim-opensource-src,
       node-jsdom, crystal, dropwatch, libzypp, taffybar, iitii,
       singularity-container, siconos, simple-ccsm, ants, bombono-dvd,
       jinja2-time, ledger-autosync, pytest-services, q2-cutadapt, bslizr,
       bchoppr, hgsubversion, freezer, hdevtools, biometric-authentication,
       xenium
   - In this case it would not be as easy to track if chances are high to
     re-introduce them, probably worth another discussion, for now I did only
     list but not tackle these.


6. ICU - the elephant in the room that feels entangled with everything else
   ICU always is massive and painful, no different this time it seems.
   I was tracking down further dependencies it blocks (as did many others
   this week).
   - ros-rosconsole was just fixed to build on riscv64 when I checked it
     No need to action on this anymore.
   - boost1.71 has is a dependency of ICU to migrate and blocked by a few
     things on its own.
     - mrs is known to FTFBS with gcc-10 for quite a while [13]
       - this also is one of the gcc-10 FTFBS reported by doko
       - mrs got removed in Debian [14] due to that
       - we need to fix the FTBFS or remove it as well
       - I pinged other people likely to work on it but no one seems to do so
         to make sure it can be found by others I filed [19]
       - Tracking down the issue revealed a rather easy fix that after a cross
         arch test build [20] reported to Debian and Upstream and uploaded
         to groovy.
     - dart FTFBS
       - this also is one of the gcc-10 FTFBS reported by doko
       - it also had a faky test on gazebo/amd64 that worked on a re-trigger
       - FTFBFS on s390x and riscv64 only
         - riscv64 segfault in Test #29: test_ContactConstraint
         - s390x segfault in test #17 test_ForwardKinematics
         - s390x segfault in test #47 test_SdfParser
         - s390x segfault in test #50 test_DartLoader
         - s390x segfault in test #51 test_IkFast
         - Same s390x issue happened to the last build some weeks ago
           https://launchpad.net/ubuntu/+source/dart/6.9.2-3 but there riscv
           was still ok
         - riscv64 => seems to be gcc-10 breakage
         - s390x => just NEVER built
         - for migration right now we don't have to fix s390x here (obviously
           we should try to see if is easy/trivial)
       - there are a bunch of slightly newer releases at
         https://github.com/dartsim/dart/milestones
         - no issue references the problems we see
         - no recent commits in that regard
         - builds slow even on s390x, will be much worse on emulated riscv64
       - I reported it upstream [27][28] on LP [29] and Debian [30] so everyone
         is at least aware
       - getting a riscv64 build env that I can debug in was quite an effort
         gladly I had at least https://people.ubuntu.com/~wgrant/riscv64/
         to start.
       - I was working on a small upload that will skip just the failing test
         on riscv64, that worked in a test build.
       - I was debugging this for a while in riscv64 qemu and found that on
         function return it seems to break the instruction pointer - at that
         point backtraces as well as everything else is broken. Almost more
         like a gcc bug than the dart code - I added a gcc-10 task to the bug.
       - I also added that insight to the upstream bug to help them
         understand the issue as well.
       - since there was no fix in reach to unblock proposed I uploaded the
         test skip for now.
   - libreoffice 6.4.5 mostly fine but armhf autopkgtests
     From the test logs it seemed flaky in the past and ken-vandine seems
     to be already on it (he was triggering restarts). Test queues were better
     today, I retriggered it as well after the last fail to a) increase the
     chance for a flaky-good run and b) in case it is a reproducible fail to
     get more test logs on it.
     Over the next few days I saw more and combined retries which means +1
     doesn't have to look after it right now since the desktop team does.
     Eventually it seems no one had a fix and it ended up in [33]
   - dee (missing build)
     - FTFBS on no change rebuild for ICU
     - this is an odd one - all architectures started & finished build two weeks
       ago
     - except riscv64 which was running still - seemed someone restarted
       the build recently
     - but that build (probably again) hangs at the after-build stage for
       hours now.
     - 7 days ago on +1 mwhudson said "dee's tests are timing out on riscv64 :("
       This isn't as it looks right now, but I need to wait until it completes
       to get a full build log
     - I pinged people that might have or actively do look into this for
       a discussion but got no response
     - it hung for two hours more in that state
     - I asked if there was some post mortem we could do and cjwatson was so
       kind to help. We identified issues of a hanging process that needed
       to be debugged and I filed [21] for it.
     - Further debugging showed that this is really a problem on it's own.
       Wgrant reported:
        <wgrant> Laney, cpaelzer: dbus-test-runner is what generally gets
        stuck, I haven't worked out why. But there are a couple of packages
        that were consistently hanging because it didn't die for a while. But
        they seemed to eventually fix themselves by around maybe September.
     - I added a dbus-test-runner task to [21] and spun a riscv64 emulator
       to see if we can catch it live for further debugging
     - for dee itself I tested if a timeout bump and/or test skip are needed
       and uploaded that fix to get things unblocked in -proposed
     - to my disappointment only the test skip got things working well, so
       I uploaded that as it is part of the ICU transition block.
     - [26] built fine then, it depends on ICU but otherwise is ready to go now
  - nodejs
    - this is a big chunk on it's own and I wanted to look at it as well
    - gladly before I did there was a devel post [24] that get sorted
      who was working on which sub-problem of it for the last weeks.
    - to get ICU closer to migrate Steve rolled back nodejs for the time being.
      After ICU transitions it can migrate non-entangled.
    - As one would expect this filled the test queues again, but there aren't
      enough results yet to spot which remaining issues need to be worked on.
      This likely should be a priority of +1 members next week when the results
      of the tests for this downgrade are in.


7. procenv (blocking sbuild/dpkg)
   - There is a GCC-10 FTFBS in procenv
   - reported as https://bugs.launchpad.net/ubuntu/+source/procenv/+bug/1889138
   - the bug had a patch to sponsor which I reviewed and sponsored
   - procenv is the package built in the sbuild tests and thereby
     was blocking sbuild/dpkg
   - after migration test retriggers were needed
   - I even found others doing combined triggers while it was in proposed,
   - since this isn't a normal test dep it only resolved once procenv was
     in -release
   - with that tests of sbuild itself unblocked
   - next dpkg tests of sbuild unblocked
     Those also needed  'sbuild/0.80.0ubuntu1' trigger due to Dpkg::Build::Info
     ::get_build_env_whitelist() in there.


8. systemd flaky tests
   - this is a usual suspect, it just needs to test so much and still is a bit
     flaky sometimes
   - IMHO important is to check the test logs, if we see different tests fail
     every time then it is most likely flakiness. It also is worth to check
     the same tests cross arch for that.
     If instead the same tests fail every time chances are high we see a
     genuine issue.
   - with the check above I identified 8 cases of flaky tests and retriggered,
     but also tracked their results in case one turns out to look more like a
     genuine issue on retry.
   - I got six packages to no more be blocked on systemd \o/ and spotted one
     unique issue around the new plymouth vs systemd tests. That was already
     mentioned in [25] quite a while ago, so I got in touch with rbalint to
     check the state. He will solve that remaining bit on the next systemd
     upload.
   - I Updated the bug [25] with that information and left the remaining tests
     now that it was clear it was a real issue


9. gubbins hangin on build dependencies
   - Debian never built non-x86, we did
   - Recent Debian change added an x86 only build dependency
   - This migrated fine in Debian but is stuck for u
   - I filed [22] for an archive admin to resolve
   - once the non x86 binary packages are removed from groovy 2.4.x should be
     able to migrate fine


10. autopilot-gtk
   - when scanning for broken build-depends for #4 I also found [31]
   - I updated the bug accordingly and just half a day later we got a fix
     (Thanks jibel)
   - Essentially this is a py2->py3 transition that has fallen through the
     cracks
   - I reviewed and sponsored the fix which will remove this finally
     from proposed migration
   - half a day later builds and tests are good and this has migrated


11. More failing autpkgtests
   - Looking at the head of excuses instead of the tail this time I found
     some more tests that failed, a few of them just are without hope
     given the result history, so I filed an MP to reset them [32]
     - this also includes try-to-test-i386-when-there-is-no-build cases
       that seem to be missed in the past
   - A few others just needed some more re-triggers (e.g. we had a flurry of
     failing apt access to the archive on arm again - I've seen those a few
     times, but still don't know what they are about)
     - As always (and I can recommend that) keep a log or tab on what you
       restarted and track if it works out

12. fatrace
    - This is another old proposed candidate, the autopkgtest is broken.
    - this was found in the past by bdmurray and filed as [34][35]
    - given that it is more than 100 day without any response we need to do
      something about it.
    - Some debugging on the case revealed that the tracing seems really broken
      in groovy missing what feels like an arbitrary amount of events
    - Ubuntu is more exposed having newer kernels in the build env but I was
      able to confirm that the bug is present in old/new fatrace version as well
      as in Debian & Ubuntu
    - I updated Ubuntu and Debian bugs with the insights so far, it might help
      to get things going again.

 [1]: https://lists.debian.org/debian-haskell/2020/06/msg00003.html
 [2]: https://launchpad.net/ubuntu/+source/haskell-gtk-strut/0.1.3.0-2build1
 [3]: https://launchpad.net/ubuntu/+source/haskell-gi-gdk/3.0.22-1build1
 [4]: https://launchpad.net/ubuntu/+source/haskell-gi-pango/1.0.22-1build1
 [5]: https://tracker.debian.org/pkg/haskell-haskell-gi-base
 [6]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4184/+packages
 [7]: https://tracker.debian.org/pkg/icingaweb2-module-x509
 [8]: https://bugs.launchpad.net/ubuntu/+source/icingaweb2-module-x509/+bug/1891011
 [9]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=885195
[10]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=965098
[11]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=967915
[12]: https://bugs.launchpad.net/ubuntu/+source/gspiceui/+bug/1891017
[13]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=957567
[14]: https://tracker.debian.org/news/1166233/mrs-removed-from-testing/
[15]: https://bazaar.launchpad.net/~ubuntu-archive/+junk/sync-blacklist/view/head:/extra-removals.txt
[16]: MP for removal of b-dep issues
[17]: https://launchpad.net/ubuntu/+source/haskell-haskell-gi-base/+publishinghistory
[18]: https://wiki.ubuntu.com/PlusOneMaintenanceTeam/Status
[19]: https://bugs.launchpad.net/ubuntu/+source/mrs/+bug/1891023
[20]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4189/+packages
[21]: https://bugs.launchpad.net/ubuntu/+source/dee/+bug/1891158
[22]: https://bugs.launchpad.net/ubuntu/+source/gubbins/+bug/1891340
[23]: https://code.launchpad.net/~paelzer/junk/sync-blacklist-plus-one-clean-groovy
[24]: https://lists.ubuntu.com/archives/ubuntu-devel/2020-August/041121.html
[25]: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1886886
[26]: https://launchpad.net/ubuntu/+source/dee/1.2.7+17.10.20170616-6ubuntu1
[27]: https://github.com/dartsim/dart/issues/1483
[28]: https://github.com/dartsim/dart/issues/1482
[29]: https://bugs.launchpad.net/ubuntu/+source/dart/+bug/1891440
[30]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=968332
[31]: https://bugs.launchpad.net/ubuntu/+source/autopilot-legacy/+bug/1856574
[32]: https://code.launchpad.net/~paelzer/britney/hints-ubuntu-clean-groovy-proposed2/+merge/389239
[33]: https://code.launchpad.net/~seb128/britney/ignore-libreoffice-armhf/+merge/389227
[34]: https://bugs.launchpad.net/ubuntu/+source/fatrace/+bug/1885188
[35]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=963714

--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd