Friday 12 July 2024

PSA: autopkgtests on ppc64el and s390x

I was recently looking at a class of autopkgtest failures, something in
the test being killed, and realized that people might see confusing test
behavior when looking at the results of specific packages.

For example with the recent rust-pyo3 results for ppc64el on oracular[1]
we can see the tests switching from pass, to fail and then a pass again.
This is[2] due to the data center in which test tests are running.

We are currently in the process of moving ppc64el and s390x tests from
scalingstack to ProdStack6. As a part of the move we have increased
the number of processors and memory available to the autopkgtest
(default) flavor used for running tests in ProdStack6. So an instance
running an autopkgtest will currently have more resources depending on
which data center it is provisioned in. The distinction you'd want to
look for in the log files are bos01/bos2 (scalingstack) and bos03 (PS6)
e.g.

autopkgtest-juju-7f2275-prod-proposed-migration-environment-2@bos02-ppc64el-11.secgroup

If you see a test passing in bos03 but not in bos01 or bos02 the package
might need adding to big_packages[3].

[1] https://autopkgtest.ubuntu.com/packages/rust-pyo3/oracular/ppc64el
[2] Well was - I added the package to big_packages to get consistent
results.
[3]
https://git.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs/

Have a great weekend!
--
Brian Murray

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: +1 maintenance report

Hey,

On Fri, Jul 12, 2024 at 5:50 PM Robie Basak <robie.basak@ubuntu.com> wrote:
> Looking at further rust issues, rust-chrono seems blocked on some
> regressions in dep8 for rust-schemars and rust-serde-with. It looks like
> some test runs at least are running out of disk space now, so I prepared
> and submitted a merge proposal for these to be added to the big_packages
> list.

Right, thanks. I'm still waiting for you to get back with the result
of retries and other bits before we can merge it. :)


- u

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: +1 maintenance report

On Fri, Jul 12, 2024 at 8:20 AM Robie Basak <robie.basak@ubuntu.com> wrote:
> rust-ashpd depwait -> rust-zbus depwait -> rust-zvariant specifically
> librust-zvariant-3+enumflags2-dev (>= 3.15.0-~~) previously synced from
> experimental version 4.0.0-1 provides librust-zvariant-4+enumflags2-dev.
> Looks like there were a bunch of things synced from experimental in late
> February that are now causing hold ups because other depending packages
> have autosynced and still require an older version. Looks like this is
> likely to shake itself out in time as packages are published into
> unstable in Debian, so it's perhaps not worth chasing this further
> without something we need that is being blocked by it.

Yes, I am hoping that the rust-zbus cluster gets fixed on the Debian
side by August. Some of the work is being coordinated in
https://bugs.debian.org/1069621
There is now a newer rust-zbus in Experimental but there are library
transitions involved so it wouldn't migrate without other packages
being updated (not yet fully done in Debian) or trying to tweak those
dependencies and hoping things work.

> Back to rust-gix, migrating rust-hashbrown et al will cause
> librust-cookie-store-dev to become uninstallable because it depends on
> an older version of librust-indexmap-dev. Looks like rust-cookie and
> rust-cookie-store need fixing as part of this cluster but they are held
> up by dep8 failures. rust-reqwest looks like it has never passed.
> Previously people have tried migration-reference/0 for them, but these
> return neutral because of a dependency issue. I think this needs badtest
> hinting.

rust-reqwest is fixed now, thanks to Simon Chopin & Skia.

There is a similar autopkgtest failure likely also due to Ubuntu
autopkgtest proxy restrictions: rust-ureq. It looks like it will need
to be handled for rust-cookie* to migrate which apparently is needed
for rust-hashbrown etc.

The rust-gvdb rebuild did not do what you intended. Generally with
these Rust packages, you'd need to either add, remove or modify a
patch to Cargo.toml and update debian/control. I'll fix rust-gvdb now.

Thank you,
Jeremy Bícha

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

+1 maintenance report

I was surprised to see Vladimir's report since I didn't know that there
was another person also assigned the same week and we hadn't
communicated. But fortunately it looks like we haven't collided with
each other.

Handover notes:

I think the rust-gix cluster can be resolved by focusing on excuses for
rust-cookie and rust-cookie-store.

Running notes:

find-proposed-cluster gave me:

rust-gix 16
linux-restricted-modules 5
libgnatcoll-db 4
rust-pprof 3
rust-prometheus-client 3

Starting with rust-gix lead me to rust-gix-hashtable -> rust-hashbrown
-> rust-dashmap. Passed on amd64 but failed on other archs. I see in
some previous failures:

301s = note: aarch64-linux-gnu-gcc: fatal error: environment variable
'DPKG_BUILDPACKAGE_PACKAGE_ARCH' not defined

So is this something to do with the recent changes to dpkg regarding
metadata in ELF binaries? The dates of the failures don't make it
obvious that a retry has been attempted after the most recent fixes, for
example dpkg 1.22.6ubuntu14 was prepared on 21 June and landed 30 June
and that's after the most recent dep8 failures for rust-dashmap on
arm64. Reproducing locally would be on amd64 so another complicating
factor. Easiest to retry on arm64 and I see the queues currently have
availability. Retry submitted. This promptly failed with a different
error also seen before, but those parameters didn't actually lead to the
DPKG_BUILDPACKAGE_PACKAGE_ARCH error from before. So retried again with
all-proposed=1 as liushuyu-011 had tried previously, but hopefully the
newer dpkg will help now. Indeed it passed, so that suggests any
failures of this class can be retried effectively. Ran through
retry-autopkgtest-regressions. Most of these succeeded. rust-rowan/s390x
failed with "erroneous package: rules extract failed with exit code 1"
which is puzzling. Tried another retry for that. This succeeded.

rust-hashbrown is now a valid candidate but makes a bunch of packages
uninstallable on arm64. For example librust-indexmap-dev depends on
librust-hashbrown-0.12+raw-dev but now librust-hashbrown-dev provides
librust-hashbrown-0.14.5+raw-dev instead. Digging into this far more
deeply, I found that rust-gvdb needs a rebuild to depend on the newest
librust-quick-xml-dev instead and uploaded a no change rebuild. This
took quite a bit of time to figure out, but it's starting to feel like a
common pattern "what needs rebuilding to make these uninstallable
package installable again"? Some heuristics might be need to
appropriately guess how dependencies will change following a rebuild,
but that doesn't seem insurmountable to have some rules that will work
in most cases. Is there such a tool? Is anyone planning on writing one?

While we're waiting to see if that worked, I looked at the next cluster,
libgnatcoll-db. Looks like work on this is in progress. Thanks Simon,
and to Jeremy for linking the bug which made it easy for me to see that
it's already being worked on.

Looking at further rust issues, rust-chrono seems blocked on some
regressions in dep8 for rust-schemars and rust-serde-with. It looks like
some test runs at least are running out of disk space now, so I prepared
and submitted a merge proposal for these to be added to the big_packages
list.

11/7 1315: Launchpad is taking >30 seconds to load pages and sometimes
timing out, making it difficult to make any progress.

rust-ashpd depwait -> rust-zbus depwait -> rust-zvariant specifically
librust-zvariant-3+enumflags2-dev (>= 3.15.0-~~) previously synced from
experimental version 4.0.0-1 provides librust-zvariant-4+enumflags2-dev.
Looks like there were a bunch of things synced from experimental in late
February that are now causing hold ups because other depending packages
have autosynced and still require an older version. Looks like this is
likely to shake itself out in time as packages are published into
unstable in Debian, so it's perhaps not worth chasing this further
without something we need that is being blocked by it.

packages.debian.org is also timing out now.

Back to rust-gix, migrating rust-hashbrown et al will cause
librust-cookie-store-dev to become uninstallable because it depends on
an older version of librust-indexmap-dev. Looks like rust-cookie and
rust-cookie-store need fixing as part of this cluster but they are held
up by dep8 failures. rust-reqwest looks like it has never passed.
Previously people have tried migration-reference/0 for them, but these
return neutral because of a dependency issue. I think this needs badtest
hinting.

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

+1 maintenance report

I was on the +1 maintenance shift on the week of July 8th, 2024
and worked from the bottom of the excuses page. The exception to this
rule was Clojure packages.

Thanks to mwhudson for fixing the issue in mescc-tools!!!

pymatgen:
The package fails to build with Python 3.12[1]. It makes
python3-mp-api and python3-emmet-core uninstallable. I have
cherry-picked the upstream commit and applied it to the package. The
package is in the NEW queue. Submitted the patch to Debian[2].

bamtools:
The package contained a buffer overflow in the bam writer[3] and an
infinite loop when reading the filter script[4]. I have added the
patches and submitted them upstream[5][6] and to Debian[7]. Package
migrated.

tools-namespace-clojure:
The package was missing the required dependencies
libjava-classpath-clojure, libtools-reader-clojure, and the
corresponding classpath entries in the jar manifest[9]. The issue was
fixed in Debian, and the package synced.

aevol:
Reviewed existing MR[8] and proposed minor changes to make the fix
easier to understand.

mescc-tools:
The package fails to build due to the illegal instruction in the
ppc64el test[10]. The same issue occurs with the 1.4.0-1 version of
the package. mwhudson investigated the problem: the test assumed r0 to
be 0 at the start of the test program - the program compared r0 with
1, jumped to the 'overrr' label otherwise, and when r0 was not 0, it
did not go back to 'bakkk' label and crashed. mwhudson uploaded a fix
for the package.

xilinx-runtime:
The package was missing headers for fixed-width integer types required
since GCC 13. Applied upstream patch with minor modifications, package
migrated[11].

woval-wabbit:
The package fails the build time tests[12]. The package is out of
date: 8.6.1 vs 9.9.0 upstream, tests pass for 9.9.0.

gitaly:
The package was uploaded to Oracular by accident. There is an existing
removal bug[13]. I have subscribed ubuntu-archive.

facet-analyser:
The package only exists in -proposed (211 days) and has conflicting
dependencies. python3-paraview and libinsighttoolkit5-dev.
libinsighttoolkit5-dev depends on libvtk9-dev which conflicts with
python3-paraview. I have filed a removal bug[14].

jupiter-ydoc:
The package only exists in -proposed (238 days) and has unsatisfiable
dependencies. I have filed a removal bug[15].

r-cran-withr:
I have filed update-excuses bugs for the migration[16][17][18][19][20][21].

tools-nrepl-clojure:
The build hangs indefinitely. tools-nrepl-clojure 0.21 use
Thread.stop() that throws UnsupportedException in Java 21. The package
now properly runs build time tests, and this exposed the failure. The
new upstream release does not have the problem[22].

[1] https://bugs.launchpad.net/ubuntu/+source/pymatgen/+bug/2067725
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069219
[3]https://bugs.launchpad.net/ubuntu/+source/bamtools/+bug/2072463
[4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987023
[5] https://github.com/pezmaster31/bamtools/issues/235
[6] https://github.com/pezmaster31/bamtools/pull/238
[7] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1075990
[8] https://code.launchpad.net/~pushkarnk/ubuntu/+source/aevol/+git/aevol/+merge/468397
[9] https://bugs.launchpad.net/ubuntu/+source/tools-namespace-clojure/+bug/2072709
[10] https://bugs.launchpad.net/ubuntu/+source/mescc-tools/+bug/2072472
[11] https://launchpad.net/bugs/2072795
[12] https://bugs.launchpad.net/ubuntu/+source/vowpal-wabbit/+bug/2072729
[13] https://bugs.launchpad.net/ubuntu/+source/gitaly/+bug/2069200
[14] https://bugs.launchpad.net/debian/+source/facet-analyser/+bug/2072724
[15] https://bugs.launchpad.net/ubuntu/+source/jupyter-ydoc/+bug/2072723
[16] https://bugs.launchpad.net/ubuntu/+source/r-cran-withr/+bug/2072806
[17] https://bugs.launchpad.net/ubuntu/+source/r-cran-withr/+bug/2072807
[18] https://bugs.launchpad.net/ubuntu/+source/r-cran-withr/+bug/2072808
[19] https://bugs.launchpad.net/ubuntu/+source/r-cran-performance/+bug/2072809
[20] https://bugs.launchpad.net/ubuntu/+source/r-cran-withr/+bug/2072810
[21] https://bugs.launchpad.net/ubuntu/+source/r-cran-testthat/+bug/2072812
[22] https://bugs.launchpad.net/bugs/2072898

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Wednesday 10 July 2024

Re: +1 Maintenance Report

On Fri, Jun 21, 2024 at 04:55:53PM -0600, Zixing Liu wrote:

> ### libxml-grddl-perl / libxml-libxslt-perl

> These require a no-change rebuild due to mismatching libxml sover.
> I can't do that since I am not a CoreDev.

But you know where some core-devs live ;)

There's also no reason in principle that you can't submit a changelog-only
MP to the sponsorship queue.

Can you provide further details here about what precisely is the reason for
a no-change rebuild, so that someone can pick this up?

I do see in the failure log the following:

253s # Warning: program compiled against libxml 212 using older 209

But I think that warning comes from libxml-libxslt-perl, not from the
packages under test? And I've seen this kind of nonsense before with
inappropriately strict checks of compile-vs-runtime versions of C libraries
from perl extensions (quite probably from this one in particular).

libxml2 | 2.9.14+dfsg-1.3ubuntu3 | oracular | source, amd64, arm64, armhf, i386, ppc64el, riscv64, s390x
libxml2 | 2.12.7+dfsg-3 | oracular-proposed | source, amd64, arm64, armhf, i386, ppc64el, riscv64, s390x

The issue is that libxml-libxslt-perl depends on libxml2 (>= 2.7.4), because
THE ABI HASN'T CHANGED; so I think libxml-libxslt-perl's runtime warning is
wrong and the correct solution here is a sourceful change to remove it.

> ### ypy
>
> This package requires the introduction of new Rust packages
> (`rust-yrs` and `rust-lib0`).
> Since Ubuntu does not maintain Rust micropackages, those need to be
> added through Debian.

Rather than leaving this in -proposed, I've added ypy to the
sync-blocklist's extra-removals file documenting its prerequisite, and
removed the unbuildable source.

> ### rust-imperative
>
> This package requires the introduction of new Rust packages (`rust-stemmers`).
> Since Ubuntu does not maintain Rust micropackages, those need to be
> added through Debian.

Idem

> ### tiledarray / btas

> Lies deep in the abyss, `tiledarray` seems to attract a lot of
> unwanted attention from countless people doing +1 shifts.

> My new findings are, `tiledarray` and `btas` need to be upgraded
> (probably needs to be done in Debian) so that they will build with new
> BLAS + LAPACK.

> The upstream for those two projects is still very much alive; they are
> just too shy to make new releases:
> https://github.com/ValeevGroup/BTAS.

> My recommendation is to remove those packages from the archive and
> re-introduce them once `btas` is upgraded in Debian.

The process for requesting removal of a source package from the archive is
to file a bug against the package and subscribe ~ubuntu-archive to it.

This should include a rationale for why it's correct to remove the current
package. It is unclear to me given the evidence available that btas needs
to be removed; it has failed to build from source on riscv64 in
oracular-proposed but dots would need to be connected showing that this is a
problem requiring an upgrade to a new upstream version for compatibility
with current BLAS + LAPACK, as opposed to some riscv64-specific issue.

> ### node-get-stream

> This package had the autopkgtest crash on ppc64el and s390x.
> Upon investigation, the crash was caused by Node.js Garbage Collector
> being unable to perform GC collections under memory pressure
> (translation: consumed too much memory and then went out of memory).

> This package blocked several Node.js micropackages.

> I am not entirely sure how to fix the issue, maybe we can add
> swapfiles in the autopkgtest runners? The tests in `node-get-stream`
> seem to require about 4 GiB of RAM.

big_packages in
https://code.launchpad.net/~ubuntu-release/autopkgtest-cloud/+git/autopkgtest-package-configs
declares a list of packages per arch whose tests require more than the usual
amount of memory (or cpu) to run. It may become obsolete once all runners
have fully moved over to PS6, but in the meantime I've added
node-get-stream/{ppc64el,s390x} to the list and re-triggered (and it
passed).

FWIW, if big_packages ever becomes insufficient for running tests on a
package like this, I'm -1 on introducing swapfile tricks to make them pass.
The value of per-arch test passes of an arch: all node package which eats
this much memory is marginal, and we should probably just hint it as a bad
test at that point.

> ### rust-secret-service

> This package requires `rust-zbus` package version to be 3.x, while we
> have 4.x in the archive.

> `rust-zbus` underwent a major API overhaul with the 3.x -> 4.x update,
> so patching it is not feasible.

> I don't know what to do with this situation, as upgrading the package
> to a newer version will have a snowball effect.

rust-secret-service is only in oracular-proposed and is not releasable in
its present version because it depends on a too-old version of another
crate. No snowball effect, this can just be removed. (This is a special
case where I don't need a bug report against the package before removing it
- done now.)

> ### ruby-rackup / ruby-rack-session

> Those require `ruby-rack` v3. This version was removed from Ubuntu
> (also only in Debian experimental).

> My recommendation is to remove those packages from the archive and
> reintroduce them once the `ruby-rack` v3 transition is completely
> finished.

The main issue with this is that we have no way to track when such packages
should be re-added.

Thanks,
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer https://www.debian.org/
slangasek@ubuntu.com vorlon@debian.org

Re: +1 shift report

Hi all,

First of all, a big thank you to Simon for letting me shadow him for even more time than I expected in the beginning... I learned a lot! We did a pair session for rust-reqwest in which he was brilliant at pulling the thread and finding out what was happening.

And then, yes, I took the r-cran-effectsize cluster as I had been using R in the past... a past far away :$. So, in a mixture of recklessness and confidence, I said to myself, why not? 

TL;DR: I only have suggestions on actions (skipping tests :( )

#r-cran-effectsize -> r-cran-bayestestr
This migration is pending on  r-cran-bayestestr/0.13.2-1: The problem here is that we are getting as regression some test that in previous commits were commented by upstream (so never passed because they were never run even there). I locally could pass the tests, but installing the dependencies and all the needed stuff via the Rconsole (install.packages(languageserver)), not the deb packages, so I guess is something about versioning or dependencies with rstan package (as the failing error is "rstan (local) .fun(model_code = .x1)") but, tbh, I  didn't hit the nail on the head. I left it for a more experienced packager on R or, as I suggested in [1], the short-term fix might be to comment those tests again. In Debian are falling too [2].

#geoalchemy2

Again, tests that are failing when previously were skipped (but only for s390x). I updated the bug [3] with the same skipping suggestion.

That's for this time... I hope next time I can produce a more touchable outcome.

Regards,

Miriam











On Mon, Jul 8, 2024 at 10:49 AM Simon Chopin <simon.chopin@canonical.com> wrote:
Hi folks,

I had my +1 shift last week. On the first couple of days I st started looking
into the very old items on the excuses page to see if I could move the needle a
bit on them (see the mescc-tools paragraph further down, spoiler alert: no joy).

After that, I focused my attention on the clusters identified by the
`find-proposed-cluster` script. Only 3 clusters popped up:

* libgnatcoll-db is blocked behind gcc-13
* rust-reqwest: see below
* r-cran-effectsize: picked up by Miriam

And finally, once those were handled, I looked into some FTBFSes.

I had the pleasure of being shadowed for part of the week by Ravi Kant Sharma,
and by Miriam España Acebal for the remainder.

## Handover for next shift

Urgent:
* giada (rtaudio6 transition)

Nice to have:
* mescc-tools

## mescc-tools (LP: #2072472)

> tools for binary bootstrapping

mescc-tools has been stuck in -proposed for more than one cycle, due to a FTBFS
on ppc64el. Looking at the logs, its tests fail due to a SIGILL. The same
package and version doesn't fail on Debian.

I didn't want to spend too long on this, and it proved surprisingly difficult
to use the usual set of debug tools, since mescc-tools is essentially a
bare-bones assembler that doesn't seem to bother with things such as symbols or
debug info (which makes sense).

The net result is no technical progress but at least I created a bug.

Next step could be a binary diff of the problematic test binary on Debian and
Ubuntu to see if/where it differs.

## rust-reqwest (LP: #2071789)

> Higher level HTTP client library - Rust source code

This crate was holding up a few other Rust packages due to tests regressing on
ppc64el and amd64, while the migration-reference/0 tests were inconclusive due
to the Rust dependency graph being what it is.

It was a really fun investigation, involving talking to both IS and the Ubuntu
Release Management team, discovering that some cargo-culted patterns I learned
before weren't actually working as I thought (I know, right?!), learning about
proxies and their interaction with custom DNS config (it's not great), and a
genuine technical mystery.

The high-level overview is that the rust-reqwest tests have *always* been
failing on our infrastructure due to our proxy, and that actually makes sense.
That's not bad per se, because you can't regress failing tests. The mystery
here is that a few weeks ago, the tests actually *passed* once on the
aforementioned architectures, despite it being impossible according to
everything I know and learned. That created a new baseline, so when the miracle
didn't reproduce, our CI thought there was a regression.

A possible solution would have been to hint the tests to reset the baseline,
but I actually opted to fix the upstream test suite to better handle the proxy
in the first place. Many thanks to both the upstream author for pointing out
that my initial patch was grossly overengineered, and jbicha for pushing the
fix to Debian.

## hypercorn vs node-mermaid (LP: #2069202)

> Markdownish syntax for generating flowcharts

I spent some time trying out the latest version of node-mermaid in Debian to
see if we could bring it back in Ubuntu, but even there it's FTBFS for reasons
unrelated to its original removal. Out of my depth, I ended up just pinging the
Debian maintainer to see if they could publish their Salsa branch in the
archive as it presumably solves the issue (I couldn't try it out due to missing
pristine-tar branch)

## giada (LP: #2072342)

> Hardcore Loop Machine

giada is involved in the rtaudio6 transition, but its NCR failed to build, so I
went about to port its code to the new librtaudio APIs, which changed fairly
drastically how error handling is done. Sadly, once that was fixed, other
errors cropped up that were related to the new JUCE version. I managed to fix
the linking issue, but ran out of time to fix the latest error, which might be
related to PIE? Hopefully the next shift can pick it up.

Cheers,
Simon

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


--
Canonical-20th-anniversary

Miriam España Acebal

Software Engineer II - Ubuntu Public Cloud/Server

Email:

miriam.espana@canonical.com

Location:

Spain  (GMT+2)


canonical.com

ubuntu.com