Friday 13 November 2020

+1 maintenance report

Hi,
I tried this week to look at various things to get the archive healthy
and proposed migrations to complete. I won't mention every silly "trigger
test re-run" since there are usually far too many - those consume quite a lot
of time thou as one also need to prep to recheck results later and go into
detail why a rety still failed and if we need fixes.

The rest might be more interesting for anyone involved (or blocked) by those
packages and also for whoever is on duty next week to know what happened last
week. 


## 1 The Perl is all around you ...

I wonder what it is that whenever I have +1 duty that there is a perl
transition ongoing? This one by the auto-sync of 2.32 from [1] Debian.
Obviously a bunch of combined triggers were needed, but the test queue is
rather full anyway and we can to some extent better wait a few days to
have the things needed all appear in -proposed.
I've seen enough other people to work on this transition (as well as boost)
already. Avoiding to do "just the same" I decided to get more uncommon
things unstuck from the queue.


## 2 usual suspect - i386 dependency fails

The odd bit on php7.4 is that it seems to have an i386 dependency-fail that
it didn't have before. We once had a britney rule like this:
  # probably fixable with Multi-Arch: foreign annotation on php-common, but needs investigation
  force-badtest php7.3/all/i386
But in 7.4 things worked - up until recently.
Since I've heard a few times "how to debug this" I have added a section "Test
for i386 dependency issues" to the i386 wiki page [3].
I'll leave the further handling to the server team, but documenting the
how-to-test seemed to be a "worth for everyone" +1 task to me.


## 3 gdb fails apport test

On
  test_add_gdb_info_damaged (__main__.T)
  add_gdb_info() with damaged core dump ... warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
  WARNING: Please install gdb-multiarch for processing reports from foreign architectures. Results with "gdb" will be very poor.
  WARNING: Please install gdb-multiarch for processing reports from foreign architectures. Results with "gdb" will be very poor.
  FAIL
And on:
  test_add_gdb_info_short_core_file
These warning messages were already present with the former version in tests that
worked fine, so the messages are a red herring. But the Fail is new.
Never the less it seemed reasonable that a new gdb might have problems with old
test files.

As an interesting side note, the behavior of the tests was odd.
They got scheduled, then ran for ~5h but always showed sub-10 minute durations.
After another day they seem to be cancelled, but back in the queue as if I'd
have triggered them again (which I didn't).
I got the suspicion that someone had to scrap and restart a bunch of tests, but
can't prove it :-)

In a local autopkgtest VM the issue was reproducible, but when I got there
I asked around if this is being worked on already and it seems bdmurray is
already on it.


## 4 mumps related builds entangle several transitions

It seems doko has done a great cleanup on rebuilds for soname changes.
In that regard a bunch of packages were built but very interdependent
in excuses. Furthermore a few syncs were incoming from Debian which also tie
into the same related set of packages. Overall I see this is about packages:
scotch, coinor-ipopt, getfem++, petsc, sdpa, trilinos, dolfin, mumps, syrthes,
superlu, deal.ii, slepc, getdp, petsc4py, slepc4py, sundials

This overall set of packages has various issues:
a) fail to build
 a1) dolfin: FTBFS all arch
 a2) getfem++: FTFBS arm64
 a3) deal.ii: FTFBS on ppc64
 a4) deal.ii: build dep on armhf and risc64
 a5) slepc: FTFBS on risc64
 a6) sundials: FTFBS all arch
b) unsatisfiable dependencies
 b1) trilinos: libtrilinos-amesos12/arm64 has unsatisfiable dependency
 b2) sdpa: sdpa/sdpam has unsatisfiable dependency
c) uninstallabilities
 c1) trilinos: makes libdeal.ii-9.2.0/9.2.0-2/arm64 uninstallable
 c2) scotch: makes libtrilinos-ifpack (and others) uninstallable on arm64&s390x
 c3) mumps: makes libtrilinos-amesos uninstallable on s390x
d) test regressions
 d1) superlu: i386 autopkgtest regression
 d2) petsc4py: arm64&ppc64 autopkgtest regression

Details:
(a1) is a self test fail at build time, fixed by [7] which is
     currently building and already has three architectures fine.
(a2) was an odd fail (broken but no build log), restarted build.
      resolved on build-retry
(a3) was a now resolved build dependency - fixed.
(a4) never built on those architectures (ok)
(a5) This was a riscv64 issue due to perl not being installable ther.
     That should be resolved by now (we are actually already moving to
     the next one). So a rebuild now (triggered) or later (once perl 5.32 is
     in) will fix this.
     Now fixed by a rebuild that I triggered
(a6) farknullmatrix.c:33: multiple definition of `F2C_ARKODE_matrix';
     This is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=957847
     which will be fixed in 4.x which is in -experimental
(b1) libtrilinos was built when mumps wasn't ready on arm64 yet.
     Due to that amd64 has libmumps-5.3 (>= 5.3.5) [4] but arm64 has
     libmumps-5.3.3 (>= 5.3.3) [5]. At least this isn't affected by
     bug 973825. A rebuild of trilios should resolve quite a lot of the
     things blocked in this set.
     I submitted a rebuild to pick this up properly and it worked fine.
(b2) dependencies are just bad, dpeending on non existing packages.
     I found that this is a known debian bug [6] of 4 days ago
     Once that is fixed and synced we will need rebuilds of sdpa (and more?)
(c1)+(c2)+(c3)
   Those all seem to be the same trilinos build that missed the new mumps version
   mentioned in (b1)
(d1) dependency issue on i386, too many deps in flight right not to try to
     resolve, but could as well just be an !i386 test override
(d2) had https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=969715 which was
     meant to be fixed. The new test fails are different. On debci 3.14 tests
     didn't run at all. We seem to need a test-reset or delta to ignore/fix
     this again.

Overall most of the things above resolved due to my work, but the overall
set of packages didn't migrate yet. There are too many open issues left like
the gcc-10 bug 957847 in sundials (and more) that need to resolve.

In general these components are in movement in Debian the recent days. So I
touched a few which had issues on "our side". But the overall context needs to
be revisited later on.


## 5 libftdi1 issues on i386

Issue1:
  autopkgtest [16:42:09]: test test-libftdi1
  Package libftdi1 was not found in the pkg-config search path.
  Perhaps you should add the directory containing `libftdi1.pc'
  to the PKG_CONFIG_PATH environment variable
Issue2:
  CMake Error at CMakeLists.txt:18 (include_directories):
    include_directories given empty-string as include directory.
This affects only the i386 tests, other architectures are good.
This was broken through all of groovy [8] with the same error.
Then it had two good tests in hirsute to now fail again.
Locally reproducible via:
$ sudo ~/work/autopkgtest/autopkgtest/runner/autopkgtest --no-built-binaries --apt-upgrade --apt-pocket=proposed=src:libconfuse --setup-commands="dpkg --add-architecture i386; apt-get update" --shell-fail --architecture i386 libftdi1_1.5-5.dsc -- qemu --qemu-options='-cpu host' --ram-size=2048 --cpus 2 ~/work/autopkgtest-hirsute-amd64.img
Passes in Debian [9] likely a Ubuntu-i386 specific issue.
Comes down to the following failing:
  $ pkg-config --cflags libftdi1
-dev is installed
ii  libftdi1-dev:i386 1.5-5        i386         Development files for libftdi1
And the file would be there
libftdi1-dev:i386: /usr/lib/i386-linux-gnu/pkgconfig/libftdi1.pc
Yet in our i386-but-not-really-Environment it fails to work.
I proposed ignoring the test for now [10] but also got hints on #ubuntu-devel
when I asked for the pattern and wanted to open a bug like [11] for libftdi1.
I had a little TIL with Cmake and got several things fixed. But eventually
my time-boxing exploded (thanks cmake) and I've given up for now [12].
It feels like it is 95% done, if a CMake+cross-test god could extend on that, then
thanks in advance. The test override has to do it for now ...


## 6 ruby2.7

Has test issues on i386. Marisa seems to need a bump to an existing test
override [13].
Mecab OTOH seems like a dependency issue on i386 that I could not yet track down
where to best resolve (or decide to just mask the test for now).
Talked with Lucas and he will give it a closer look.


## 7 libcpupower missing

I've hit this on one of my past +1 duties [14] and it seems others did so as
well. Turned out that the discussion is even older [15]. I marked the bugs
accordingly and made the newer a dup of the older one. But the TODO is on
the kernel team here.


## 8 Netgen

This is an FTFBS for a while and thereby hangs around in proposed being
retriggered every now and then depending on who comes by.
The issue is due to upstream breaking non-x86. After tracking down the details
I realized that Ubuntu users likely will use the upstream provided PPA
instead anyway.
But on that trip I found the root cause and a proposed fix upstream, so I have
filed Debian bug [16] to make the maintainer aware as well as [17] on launchpad
to avoid another +1 member to re-debug this.


## 9 libsmitwatermelon FTFBS

This is a case of the generally odd C++ symbols breaking on dh_makeshlibs for
Ubuntu s390x/ppc64 builds being somewhat different for no too obvious reason.
I was filing [18] with a fix that I verified to work but IMHO not being worth
an Ubuntu Delta upload. If the Debian maintainer accepts it the next auto-sync
will resolve this.


## 10 boost 1.71 blocked on shapeit4

This is a one-off success, otherwise always failed
https://autopkgtest.ubuntu.com/packages/s/shapeit4/hirsute/s390x
https://autopkgtest.ubuntu.com/packages/s/shapeit4/groovy/s390x
https://autopkgtest.ubuntu.com/packages/s/shapeit4/focal/s390x
We should reset-test this to get things moving, MP for that [20].


## 11 Clustalo FTFBS on s390x

Clustalo 1.2.4-4build1 is happy but 1.2.4-6 added a test that fails.
Erorr:
  # Run additional test from python-biopython package to verify that
  # this will work as well
  src/clustalo -i debian/tests/biopython_testdata/f002 --guidetree-out temp_test.dnd -o temp_test.aln --outfmt clustal --force
  make[1]: *** [debian/rules:36: override_dh_auto_test-arch] Segmentation fault (core dumped)
In Debian the build works just fine [21] but Ubuntu reproducibly fails [22]
This is reproducible in a s390x LXD container in a built tree (hirsute).
Debugging showed that it already uses -O0 for mipsel and with gcc-10.2
s390x needs the same treatment to not segfault.
Analyzed and reported (no tracker, just mail) proposed to extend that -O0 to
s390x as well [23] to be visible in update excuses also a tracker [24].
This could eventually be an issue with s390x gcc-10.2 optimization, so I
have got this bug mirrored to IBM for evaluation.
I verified that a merge of 3.23 would as expected fix it (as well as a single
patch backport)

## 12 tgt test fails blocking fio sync

This looked odd at first as it only failed tgt@s390x
while there should be no obvious reasons for being different on those platforms.
Since I like those platforms I was giving those tests a look.
Both come down to (the neither non-arch-specific):
fio: io_u error on file datafile.tmp: No space left on device: write offset=95158272, buflen=65536
Ok, this most obvious message is a red herring. The disk that is used is
created by the test itself and is 100MB on each arch. And the test is meant to
run until it runs "out of disk" - therefore in good cases the message is
present as well.
The bad RC from fio is the breaking factor and that being good/bad is
reproducible at least on the tgt test on s39x0/x86.

While the "out of space" is by design there is another error that seems to be
architecture specific:
  verify-phase: you need to specify size=
  fio: pid=24570, err=22/file:filesetup.c:1057, func=total_file_size, error=Invalid argument
Switching back from fio 3.21-1 to 3.16-1 from -release fixes the issue, so it
seems indeed to be some sort of regression in the new code.
I checked git and after some time an existing issue [25] with a fix [26].

Since I tested this with 3.23 (which works fine) I have proposed that to
Debian [28]. The next auto-sync will then get this one resolved.
To track this in excuses I have opened LP bugs [29][30] tagged with
update-excuse.

A day later after an upstream discussion I had a workaround which lowers the
memory pressure and submitted it to Debian in [37].


## 13 multipath test fails blocking fio sync

This was not fixed by the fix of #12 above - but also this was only affecting
one architecture and seemed to be non reproducible outside of launchpad infra-
structure.
It almost seems more like an issue in a different component than fio which
triggered the issues. Lacking a local reproducer I was forced to create PPAs
to debug things on the infrastructure (3.16 vs 3.21 vs 3.23)
The ppc64el issue turned out to be an OOM kill, but one reliably triggered by
the new version of FIO. I found that the memory consumption of FIO itself more
than doubled for the given workload and ppc64 just was the arch with the
tightest memory.
A hirsute rebuild of 3.23 from git shows the same issue while 3.16 from git is
good - so it should again be bisectable.
I found two changes to the statistical data it gathers which caused the
increase and reported it with a lot of detail upstream [27].

For the ubuntu we need to mark these tests as "big_packages" which I proposed
in [31]

A day later after an upstream discussion I had a workaround which lowers the
memory pressure and submitted it to Debian in [38].


# 14 Perl comes back to me for libvirt/postgresql

After a few days into the perl transition doko pinged me if I could look into
the remaining build failures as they are close to what I usually work on.

First of all libguestfs is blocked by libsys-virt-perl
The old one still depends on perlapi-5.30.3 but due to the transition
perlapi-5.32.0 is needed.

And libsys-virt-perl in turn is blocked, missing a newer libvirt.
I had libvirt 6.8 already 80% ready before my +1 week and wanted to work on it
next week again.

The final affected packages are postgresql-12 and postgresql-13, those are
FTBFS and fail in autopkgtests. This will be fixed by the stable uploads that
are released today.

As expected the issue of postgresql-13 is fixed in 13.1-1 [33] and will be for
us in [34] once auto-synced later today.

The only problem is that we might not want to upload another 12.x to Ubuntu
21.04 as we want to go to postgresql-13 eventually. Also in Debian after
checking with the maintainer the preferred option seems to be a removal from
-testing [32].

I'm concerned about removing too much of hirsute if we remove postgresql-12 now.
Therefore I'll go ahead of Debian an upload the stable release of v12 today
which will fix the issue for v12 for now (still to be later in the cycle
removed)..

So in addition to waiting for 13.1 to show up via auto-sync I prepared all
other stable updates as well which also cover a bunch of CVEs for the supported
releases [35] and the build/test fail filed by rbalint [36].

13.1-1 as synced from Debian built fine as expected, so did 12.5 in a PPA which
I then uploaded. Together with security I'm preparing the same stable updates
for all active releases but that will be next week (not part of the +1 duty).

Finally plenty of postgresql related tests failed in the past as they need to
be triggered together to work. I have done that over the weekend (once the new
builds were in) to get this closer to migrate.


# 15 ebtables breaking tests

While trying to look at libvirt for perl (see above) I have early on identified
an issue with iptables/ebtables that turned out to be broken not only in
hirsute but also in groovy.

An initial quick check confirmed that it was not my new libvirt version nor the
hirsute release at all. So this was worth an investigation if we might have
general issue. A discussion with security showed that the issue could match the
merge of 1.8.5 and/or the whole iptables/ebtables/nftables move.

I was tracking down which component causes this and then filed a bug [39].
Although I have to admint it feels like my ebtables-foo might just be too weak,
but then it will be TIL moment once explained :-)

While I found the issue on +1 duty after the initial analysis it became clear that the task isn't qualifying for +1 so I'll continue on it next week (in the context of libvirt which we want to have for perl anyway).



[1]: https://lists.debian.org/debian-devel-announce/2020/11/msg00001.html
[2]: https://people.canonical.com/~ubuntu-archive/germinate-output/i386.hirsute/i386+build-depends
[3]: https://wiki.ubuntu.com/i386
[4]: https://launchpadlibrarian.net/504480967/buildlog_ubuntu-hirsute-amd64.trilinos_12.14.1-5_BUILDING.txt.gz
[5]: https://launchpadlibrarian.net/504427600/buildlog_ubuntu-hirsute-arm64.trilinos_12.14.1-5_BUILDING.txt.gz
[6]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973825
[7]: https://launchpad.net/ubuntu/+source/dolfin/2019.2.0~git20200629.946dbd3-4
[8]: https://autopkgtest.ubuntu.com/packages/libf/libftdi1/groovy/i386
[9]: https://ci.debian.net/packages/libf/libftdi1/testing/i386/
[10]: https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393500
[11]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946577
[12]: https://paste.ubuntu.com/p/6VDFJmSTf6/
[13]: https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393501
[14]: https://bugs.launchpad.net/ubuntu/+source/gkrellm2-cpufreq/+bug/1891336
[15]: https://bugs.launchpad.net/ubuntu/+source/cpufreqd/+bug/1215411o
[16]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974136
[17]: https://bugs.launchpad.net/debian/+source/netgen/+bug/1903719
[18]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974137
[19]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966014
[20]: https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393558
[21]: https://buildd.debian.org/status/fetch.php?pkg=clustalo&arch=s390x&ver=1.2.4-6&stamp=1589135789&raw=0
[22]: https://launchpadlibrarian.net/498532550/buildlog_ubuntu-groovy-s390x.clustalo_1.2.4-6_BUILDING.txt.gz
[23]: https://salsa.debian.org/med-team/clustalo/-/merge_requests/1
[24]: https://bugs.launchpad.net/ubuntu/+source/clustalo/+bug/1903817
[25]: https://github.com/axboe/fio/issues/1065
[26]: https://github.com/axboe/fio/commit/fd56c235caa42870e6dc33d661514375ea95ffc5
[27]: https://github.com/axboe/fio/issues/1123
[28]: https://salsa.debian.org/debian/fio/-/merge_requests/6
[29]: https://bugs.launchpad.net/ubuntu/+source/fio/+bug/1903963
[30]: https://bugs.launchpad.net/ubuntu/+source/fio/+bug/1903962
[31]: https://code.launchpad.net/~paelzer/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/393641
[32]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974061
[33]: https://buildd.debian.org/status/fetch.php?pkg=postgresql-13&arch=amd64&ver=13.1-1&stamp=1605173772&raw=0
[34]: https://launchpad.net/ubuntu/+source/postgresql-13/13.1-1
[35]: https://bugs.launchpad.net/ubuntu/focal/+source/postgresql-12/+bug/1903978
[36]: https://bugs.launchpad.net/ubuntu/+source/postgresql-12/+bug/1903573
[37]: https://salsa.debian.org/debian/tgt/-/merge_requests/1
[38]: https://salsa.debian.org/linux-blocks-team/multipath-tools/-/merge_requests/1
[39]: https://bugs.launchpad.net/ubuntu/+source/iptables/+bug/1904192

--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd