Friday 11 June 2021

+1 Maintenance report

Hi,
I was looking at +1 tasks this week and was happy to find -excuse way less
crowded and concerning than usual. Obviously there are always some moving parts,
but the biggest active () transitions seemed to be poppler driven by
desktop and php driven by server - both under control as they are actively
worked on by those teams. Only at the end of the week I've found akonadi, but
that is just very new and might go through without additional help thanks to
the KDE flavour people

Due to the above (not many huge unowned transitions) - there wasn't much first
grade +1 work to do. Further many people use similar tools to identify +1 work
and due to that most obvious retrigger candidates were already in the queue.
So I looked further and as usual I've found a lot others that stayed
unaddressed.
But not blocking the whole archive does not mean it should clutter excuses
forever and eventually becoming just as much of a problem.
Therefore I was having (mostly but not exclusively) a look at a bunch of impish
FTFBS - especially on architectures I'm familiar with to provide effective
help that might be more painful for others to do.

Due to that - as usual - it was an interesting week from debugging ultra-long
lisp backtraces to on disk big endian byte swaps ...


---

Recent systemd tests break on armhf

When starting the week I was seeing that recent failures in a usual suspect
of "systemd autopkgtests" began blocking a lot of packages. It only occurred
on armhf which is everyone's least favorite platform to debug
autopkgtest issues.
I have seen a bunch of packages including even gdm3 and glibc being blocked by
that so I wanted to at least track down the issue until we can put it on
someone's task list to resolve.

Systemd 248.3-1ubuntu1 is rather new, but had 5 successful tests on armhf
before now slipping into a bad mode.

No one replied to my pings yet, but maybe that means someone is already
debugging this and has enabled some debugging?
After some debugging I found that it works fine in canonistack and filed

So I can not reproduce this on canonistack, but it blocks autopkgtests of
various packages pretty reproducibly :-/

I filed [6] to document what I've found and via update-excuse tag to be
more visible. But it would probably need the systemd maintainers to have
a look.

Later throughout the week discussions continued, the current state is
non-reproducible in canonistack but reproduced by Laney on a RPi.

---

dogtag-pki broken on s390x

This seems to break every time I'm on +1 duty.

History in +1 Duty reports:
- Timo Aaltonen Thu, Jan 21, 8:54 PM
- Sergio Durigan Junior Fri, Jan 22, 10:33 AM
- Bryce mentioned this on Sat, Apr 10, 12:12 AM, but it wasn't finished.
- Lukas mentioned this one already on Fri, May 14, 5:58 PM as needed to debug.
- Lukas also confirmed that the 389 base fix is in and this being a different
issue now.

We can see above that this has already consumed (or wasted) quite some
people's time.

This time s390x blocks a merge that was done for nss by Locutus a while ago.
Plenty of people have already re-tried this test and it seems stable
and failing.

I was recreating this in an s390x environment and found a crash on a double
free. I found [2] and similar fixes in 3.66.

I documented this in [3] and the solution will be a merge of 3.66.
But that is blocked by [4] so we'll wait on this one. Timo identified a maybe
related, yet unreleased upstream fix and based on that I spotted another one.
Test builds with these applied went into another round of tests - but to no
success.

The proper tagging of an update-excuse bug [3] which we now have should
help to not be a case re-investigated over and over. Also further work on this
should update the case and reference it [4] (This is not +1 work anymore).

---

libcpupower.so

Another usual suspect is gkrellm2-cpufreq which is blocked waiting on the
Ubuntu kernel to provide a lib that exists in Debian.
I've pinged on [5] for the sake of not being totally forgotten, but there isn't
much more I could do atm.

---

gdisk FTFBS on s390x

This was already hanging for more than a month.
It was also broken in s390x [7] on Debian.

The former version [8] still has built fine and upstream has no fixes
since 1.0.7,

Running in an s390x VM the issue can be reproduced
(working 1.0.6 failing 1.0.7).

The test creates an empty file with dd and operates on that.

The probing on the empty file used to complain about bad entries which are fixed
in the new version (or that is already a symptom of the problem).

Binaries gdisk as well as sgdisk are affected.

The sequence of commands into a bad state is:

TEMP_DISK=$(mktemp)
dd if=/dev/zero of=${TEMP_DISK} bs=1024 count=65536
./sgdisk ${TEMP_DISK} -o
./sgdisk ${TEMP_DISK} -n 1 -c '1:Linux filesystem' -t 1:8300
./gdisk -l ${TEMP_DISK}

In the new version this creates corrupted data like:
Number Start (sector) End (sector) Size Code Name
1 2048 131038 63.0 MiB 8300 䰀椀渀甀砀 昀椀氀攀猀礀猀琀攀洀

Reading the very same file with the old gdisk is good:
Number Start (sector) End (sector) Size Code Name
1 2048 131038 63.0 MiB 8300 Linux filesystem

Reading it with old & new sgdisk is good as well:
sgdisk -p ${TEMP_DISK} -i
...
Number Start (sector) End (sector) Size Code Name
1 2048 131038 63.0 MiB 8300 Linux filesystem

So it seems gdisk read/display is messed up in 1.0.7
The display code in DisplayGPTData / GPTPart::ShowSummary is unchanged.

The value read seems correct, but this broke it:
https://sourceforge.net/p/gptfdisk/code/ci/86dd5fea351a5a55bea26b7622eb85ebd6075a60/

Reverting this fixes the problem.

Reported upstream
https://sourceforge.net/p/gptfdisk/mailman/gptfdisk-general/thread/8db702c5-642c-705d-294c-2df60a070ff6%40gmail.com/#msg37298402
and to Debian
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989589
and fixed in Ubuntu in (currently tested in PPAs)
https://launchpad.net/ubuntu/+source/gdisk/1.0.7-1ubuntu1

At the same time I was working closely with upstream to get a final fix.
The original change was for 32bit PPC and still is a real issue there. While
Ubuntu doesn't have 32 bit ppc anymore. We all would want the code to work on
all target platforms. Hence we worked together to resolve this.

This could be much worse, we've found that (even our current) sgdisk writes
the labels wrong on s390x Ubuntu.

Wrong:
00000450: 0073 0074 0065 006d 0000 0000 0000 0000 .s.t.e.m........
Correct
00000440: 2000 7300 7900 7300 7400 6500 6d00 0000 .s.y.s.t.e.m..

An "s" should be 7300 and not 0073.

Due to that also other tools e.g. parted can't read the tables correctly
on our s390x builds.
So the fix isn't to change the display function but to fix the broken
partition label writing.

The problem will be what about the already byte-swapped partitions in the field?
I checked and all our releases since 16.04 up to impish are affected.

This started as usual "let us unblock something" but at this point
now that it turned out to be a real issue in a server team package (I didn't
realize this until now as I've never dealt with sgdisk since I'm around - if
anythone I'd have expected foundations to own it). Anyway, it isn't
+1 work to continue further now that I've found the root cause.
I'll leave this to be continued in team-time ...

---

xrdp FTFBS on ppc6e and s390x
This also blocks https://launchpad.net/ubuntu/+source/xorgxrdp/1:0.2.15-1

Upstream now explicitly enabled arch by arch.
This is also broken in Debian the same way.

s390x was added in 0.9.16 via [10] ppc64el via [11].

I guess without a strong reason to push forward we can just wait on this one
until 0.9.16 is available and synced from Debian.

I filed [9] so that this is easily discoverable from update-excuses.
I also linked and updated the related Deban bug.

---

konclude FTFBS on s390x

In Debian the same issue did not happen, but in Ubuntu it is reproducible
in an official builder as well as if ran it manually on an s390x system.

I debugged this until I found a segfault that was unclear how to fix it.
So I filed a ubuntu-excuse tagged bug [12] to avoid re-debugging this.
And I informed upstream about the issue to have a chance at resolving this
down the road.
So far I got feedback that the report is good and that they want to have
a look.

---

Next I've seen that poppler was a classic transition in need of some help.
It had a bunch of issues:
1. some tests failed, some flaky some maybe not
2. a lot of dependencies were not yet rebuilt, so they were blocked like
removing libpoppler107/21.02.0-1/amd64 from testing makes
extractpdfmark/1.1.0-1.1/amd64 uninstallable
3. a set of implicit dependencies, some not considered (e.g. libreoffice)

The Kopanocore armhf tests seemed just flaky and in a similar fashion
Libreoffice was blocked by it's own known-flaky armhf test. Retriggered
as well.

I've seen on some of them that others seem to have already started on this
transition, so I pinged for others.
I've found a no change rebuild of some of the dependencies by seb128
=> https://launchpad.net/ubuntu/+source/extractpdfmark/1.1.0-1.1build1

The first time I checked I found rebuilds of gambas3, calligra,
kitinerary, openboard and scribus missing.

I only did a quick check and quickly realized that this is a desktop
subscribed package anyway. So they will take care of it.


---

cif2cell

This is actually part of a standing override in hints.
As it is an arch_all package and nothing else.
This one might not be present as it was removed from hirsute and now
is coming back.

Due to that we see bad pkg issues:
https://autopkgtest.ubuntu.com/results/autopkgtest-impish/impish/i386/c/cif2cell/20210516_174401_e34a7@/log.gz

What we need to do is to refresh the hint for this kind of package.
I've regenerated this list and proposed that - accepting that will
unblock cif2cell and a few others.

See [13] for the MP as usual I was also filing a bug [14] for an update-excuse
tag to ensure this won't be debugged again by someone else.

---

crowdsec

FTFBS on all arches, it is rather new and broken for Ubuntu.
I thought about a removal, but I'm unsure if that is needed if it is only
to clear -proposed - it is rather new so maybe there are follow on uploads
soon.
I think we can leave it as is, following uploads will replace it and
hopefully be better.

---

lintian-brush

I've found upstream-ontologiist and debmutate independently fail on all
architectures tests of lintian-brush.

I've seen a few other people retrying this, but no fixes or bugs about it yet.

Errors are about "testdir" present in test output
Like
@@ -1,2 +1,3 @@
+Name: testdir

The test history recently looks rather broken but in the past was stable and ok.
Seems like something else has got into Impish undetected by this, but now is
breaking the test.

The issue reproduces locally and is independent to the new packages that
currently are blocked on this.

The test is python3 unitest based and reproduces locally in an autopkgtest VM.
It also looks all-bad in ci.debian.net, the errors there.

I fixed one of them and reported it to Debian [15] . For the remaining issue
there is a bug already [16]. This latter one is not locally reproducible
in an autopkgtest VM.
Furthermore I've spotted and filed another conflict with the new breezy and
filed that with a solution that I verified at [17].
As usual an update-excuse bug [18] holds these infos together to avoid
re-debugging.

---

qemu-web-desktop

This blocks on non-existing packages in Debian and Ubuntu.
And this isn't something myself or the server team would look after,
so it is valid for +1 duty and I might have some helpful insight to
the history of these dependencies.

After having a look I outlined the changes that are needed and filed [19]
and a update-excuse tracker [20]

---

gdebi

This is an FTBFS which after some debugging turns out to be due to glib2.0 2.68.
Since that switch was late in Hirsute it won't work anymore.

I've found that this comes down to slightly changed behavior in Gio.File.
The change seems reasonable, but the package call does not - it tries to
check on a non-existing file which obviously now fails (harder than before).

I've found a fix and filed an MR upstream [21]. Since I'm not sure who
picks it up first I also filed a bug in Debian [22] for the awareness before
glib2.0 2.68 happens there and finally the usual update-excuse tracker [23].

The builds don't give us anything new, right now nothing is blocked on this.
I don't think we need to go ahead and apply delta for this, worst case there
is an update required, breaks on this and finds [23] which contains a
reference to the solution already.

---

metview / atlas-ecmwf

This seems to be a build failure around the transition of libeckit0d.
This is not too old and failing for similar reasons in Debian.
Also those are leaf packages that are not much used, so while one could work
on this they most likely are resolved by the Debian maintainer and no one
wants duplicate work. As long as no other things are blocked by those
they most likely can stay as-is for now.

---

Xindy

This is a ppc64/s390x FTFBS due to segfaults in clisp, that sounds interesting.
And there also is clisp FTFBS on the very same platforms in -proposed.

After a bit this seemed to get a loner session so I first filed an update-excuse
tracker [24]

I recreated the crash locally on s390x, but it is an insanely deep unreadable
backtrace.

I checked if the behavior compiling the same in Hirsute is still ok and then
determined which component made it fail.

This was an interesting trip, but all I really achieved was a lot of learning
and an upstream report [25] so this stays unresolved for the time being :-/.


---

My next candidates to look at would have been "suricata" or if the transition
related to "akonadi" needs help, but I didn't find the time for that anymore.


References
[1]: https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1931088
[2]: https://github.com/nss-dev/nss/commit/350807b3a70f60928ea3f2bc95fd1795aae9b753
[3]: https://bugs.launchpad.net/ubuntu/+source/nss/+bug/1931104
[4]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989410
[5]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1215411
[6]: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1931088
[7]: https://buildd.debian.org/status/fetch.php?pkg=gdisk&arch=s390x&ver=1.0.7-1&stamp=1617791278&raw=0
[8]: https://launchpad.net/ubuntu/+source/gdisk/1.0.6-1.1
[9]: https://bugs.launchpad.net/ubuntu/+source/xrdp/+bug/1931225
[10]: https://github.com/neutrinolabs/xrdp/commit/1d1ec9614f84243e4a08256f82994278d082b592
[11]: https://github.com/neutrinolabs/xrdp/commit/3b81df3f9e894dd164f86d8cf87c3a171ced6d08
[12]: https://bugs.launchpad.net/ubuntu/+source/konclude/+bug/1931229
[13]: https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/403885
[14]: https://bugs.launchpad.net/ubuntu/+source/cif2cell/+bug/1931260
[15]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989634
[16]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988909
[17]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989633
[18]: https://bugs.launchpad.net/ubuntu/+source/lintian-brush/+bug/1931369
[19]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989638
[20]: https://bugs.launchpad.net/debian/+source/qemu-web-desktop/+bug/1931375
[21]: https://code.launchpad.net/~paelzer/gdebi/gdebi-fix-glib-2.68/+merge/403954
[22]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989647
[23]: https://bugs.launchpad.net/debian/+source/gdebi/+bug/1931394
[24]: https://bugs.launchpad.net/ubuntu/+source/xindy/+bug/1931531
[25]: https://sourceforge.net/p/clisp/mailman/clisp-devel/thread/CAATJJ0KdgVUA6kb_QQVBVgFcKuyeCF_9Z4NcmVokfydhhYx3%2BQ%40mail.gmail.com/#msg37300059


--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel