Thursday 13 January 2022

Re: Ongoing autopkgtest-cloud armhf maintenance

We are now operating at full capacity again. Turns out we also
did not have 11 workers, but 12, so anywhere I said 33 is
actually 36 :)

Some items remain TBD, but the rest is done and got us back
on our feet again:

On Wed, Jan 12, 2022 at 05:48:04PM +0100, Julian Andres Klode wrote:
>
> # Pending work
>
> - Move /var/snap/lxd/common out of /srv (where lxd storage pool lives);
> this will likely require slightly increasing the '/' disk size.
>
> - Investigate further where the 30s timeout in lxd comes from and how
> to prevent that (or just ignore it, but next item)

2x TBD

>
> - Investigate were the stuck instances came from and why they were not
> cleaned up. Is it possible for us to check which instances should be
> running and then remove all other ones from the workers? Right now
> we just do a basic time check

There were no errors logged. I saw mentions of exit code -15, but
nothing concrete.

But we now have new cleanup where we only keep as many containers as needed,
deleting everything else older than 1 hour.

>
> - The node lxd-armhf10 needs to finish its redeployment once the
> lxd images exist again
>
> - The node lxd-armhf9 needs to be redeployed to solve the disk I/O
> issue
>
> - Both lxd-armhf10 and lxd-armhf9 will have to be re-enabled with
> the new IPs in the mojo service bundle

Those 3 redeployments have happened

>
> - We should really redeploy all the lxd workers to have clean workers
> again

TBD, need to figure out partitioning for /var/snap/lxd/common, but
does not seem urgent right now.
--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel