Friday 26 July 2024

Re: many systemd units failing in oracular LXD containers

On Fri, Jul 26, 2024 at 12:56:20PM -0400, Nick Rosbrook wrote:
> > That's all fair.

> > In this particular case, the LXD team is already working hard on
> > fixing it there, so I think reverting systemd at this point would be
> > more trouble than it's worth. I will sync with them, and if it seems
> > like it will be a while before their fixes land for users, I will
> > upload a workaround to override the problematic unit settings for all
> > services.

> I have just checked with the LXD team, and this has already been fixed
> in latest/edge for LXD. I confirmed the fix myself just now.

We don't deploy from latest/edge in production, nor should we. Do we have
an ETA for when this will land in the stable channels for LXD that are used
by default in the Ubuntu LTSes?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer https://www.debian.org/
slangasek@ubuntu.com vorlon@debian.org

Re: many systemd units failing in oracular LXD containers

On Fri, Jul 26, 2024 at 12:33 PM Nick Rosbrook
<nick.rosbrook@canonical.com> wrote:
>
> On Fri, Jul 26, 2024 at 12:20 PM Robie Basak <robie.basak@ubuntu.com> wrote:
> >
> > On Fri, Jul 26, 2024 at 12:11:05PM -0400, Nick Rosbrook wrote:
> > > In short, this is not systemd's bug.
> >
> > I don't think that matters. The idea of the autopkgtest infrastructure
> > and "always being green" is that we hold back packaging updates if it
> > would regress behaviour, even if it's the "fault" of a different
> > package.
> >
> > It follows that we should revert if an update slips past our CI
> > infrastructure such that the behaviour regresses.
> >
> > Otherwise, new unfortunate interactions between packages cause everybody
> > else's development to grind to a halt.
> >
> > There is a trade-off here of course, in terms of minimising cost. It may
> > be appropriate to "push ahead" after a regression occurs, or indeed
> > deliberately bypass CI to land regression behaviour, if that is on
> > balance going to maximise progress from that point.
> >
> > But if such a trade-off to be made, I think it needs justification. Our
> > default position should be to minimise regression and "always be green".
> >
>
> That's all fair.
>
> In this particular case, the LXD team is already working hard on
> fixing it there, so I think reverting systemd at this point would be
> more trouble than it's worth. I will sync with them, and if it seems
> like it will be a while before their fixes land for users, I will
> upload a workaround to override the problematic unit settings for all
> services.

I have just checked with the LXD team, and this has already been fixed
in latest/edge for LXD. I confirmed the fix myself just now.

-Nick

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: many systemd units failing in oracular LXD containers

On Fri, Jul 26, 2024 at 12:20 PM Robie Basak <robie.basak@ubuntu.com> wrote:
>
> On Fri, Jul 26, 2024 at 12:11:05PM -0400, Nick Rosbrook wrote:
> > In short, this is not systemd's bug.
>
> I don't think that matters. The idea of the autopkgtest infrastructure
> and "always being green" is that we hold back packaging updates if it
> would regress behaviour, even if it's the "fault" of a different
> package.
>
> It follows that we should revert if an update slips past our CI
> infrastructure such that the behaviour regresses.
>
> Otherwise, new unfortunate interactions between packages cause everybody
> else's development to grind to a halt.
>
> There is a trade-off here of course, in terms of minimising cost. It may
> be appropriate to "push ahead" after a regression occurs, or indeed
> deliberately bypass CI to land regression behaviour, if that is on
> balance going to maximise progress from that point.
>
> But if such a trade-off to be made, I think it needs justification. Our
> default position should be to minimise regression and "always be green".
>

That's all fair.

In this particular case, the LXD team is already working hard on
fixing it there, so I think reverting systemd at this point would be
more trouble than it's worth. I will sync with them, and if it seems
like it will be a while before their fixes land for users, I will
upload a workaround to override the problematic unit settings for all
services.

Thanks,
Nick

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: many systemd units failing in oracular LXD containers

On Fri, Jul 26, 2024 at 12:11:05PM -0400, Nick Rosbrook wrote:
> In short, this is not systemd's bug.

I don't think that matters. The idea of the autopkgtest infrastructure
and "always being green" is that we hold back packaging updates if it
would regress behaviour, even if it's the "fault" of a different
package.

It follows that we should revert if an update slips past our CI
infrastructure such that the behaviour regresses.

Otherwise, new unfortunate interactions between packages cause everybody
else's development to grind to a halt.

There is a trade-off here of course, in terms of minimising cost. It may
be appropriate to "push ahead" after a regression occurs, or indeed
deliberately bypass CI to land regression behaviour, if that is on
balance going to maximise progress from that point.

But if such a trade-off to be made, I think it needs justification. Our
default position should be to minimise regression and "always be green".

Robie

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: many systemd units failing in oracular LXD containers

On Fri, Jul 26, 2024 at 11:19 AM Robie Basak <robie.basak@ubuntu.com> wrote:
> I was surprised to see the security.nesting=true workaround going in to
> samba in LP: #2046486 though. That, together with developers having to
> set security.nesting=true everywhere to continue with their work, does
> still seem onerous. If this problem was introduced by a new systemd, why
> wouldn't a systemd revert help the situation?
>

In short, this is not systemd's bug. For years, there has been a
struggle between systemd utilizing various namespaces more to provide
sandboxing features, and LXD's AppArmor rules being overly
restrictive. Through my discussions with the LXD team, we have agreed
that LXD needs to adapt to this, and that by default
security.nesting=true makes sense for unprivileged containers. So yes,
it should be temporary that users/developers need to do this
themselves.

If we *really* need to do something in src:systemd to workaround this,
there are other workarounds that I would take rather than reverting an
entire new upstream version.

Thanks,
Nick

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: many systemd units failing in oracular LXD containers

Hi,

On Fri, Jul 26, 2024 at 12:20 PM Robie Basak <robie.basak@ubuntu.com> wrote:
>
> On Wed, Jul 24, 2024 at 09:06:13AM -0400, Nick Rosbrook wrote:
> > On Wed, Jul 24, 2024 at 8:18 AM Robie Basak <robie.basak@ubuntu.com> wrote:
> > > There seems to be a second issue between systemd and lxd which
> > > security.nesting=true doesn't seem to fix:
> > >
> > > https://github.com/canonical/lxd/issues/13807
> >
> > I cannot reproduce this with Oracular or Jammy containers running on a
> > Noble host. [1] However, also note that my containers are using ext4
> > for the rootfs. Are you using ZFS? If so, this sounds similar to [2],
> > but we uploaded a workaround in systemd-sysusers for Noble (and it's
> > present in upstream >= v256) and I thought the kernel got fixed, too.
>
> Thanks! A newer kernel is what I needed. IIUC, systemd 255.4-1ubuntu8 is
> supposed to handle an older kernel with this issue though, and it
> doesn't seem to? So I'm not sure if it's the same bug or not.
>
> > > I've just heard that Oracular Raspi pre-install images have been broken
> > > for a week for what appears to be the same reason.
> >
> > Is there a bug you can share? I have not seen details of this yet.
>
> The failures are here:
> https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/oracular/ubuntu-preinstalled
>
> > > What do you think about kicking this systemd update back to
> > > oracular-proposed until it is resolved properly, and/or uploading a
> > > revert?
> >
> > I don't see sufficient evidence that this would help the situation.
> > But then again, I am confused about the details of this bug on
> > Oracular vs Jammy because your LXD issue is about Jammy, and I have
> > not seen any details for the Oracular Raspi issue.
>
> Sorry - I was looking at multiple lxd issues in the same week and I
> conflated them. This one was for a Noble host running a Jammy container
> and you're right to question that it has nothing to do with Oracular.
>
> I was surprised to see the security.nesting=true workaround going in to
> samba in LP: #2046486 though. That, together with developers having to

My understanding is that workaround is temporary. Am I mistaken?

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

Re: many systemd units failing in oracular LXD containers

On Wed, Jul 24, 2024 at 09:06:13AM -0400, Nick Rosbrook wrote:
> On Wed, Jul 24, 2024 at 8:18 AM Robie Basak <robie.basak@ubuntu.com> wrote:
> > There seems to be a second issue between systemd and lxd which
> > security.nesting=true doesn't seem to fix:
> >
> > https://github.com/canonical/lxd/issues/13807
>
> I cannot reproduce this with Oracular or Jammy containers running on a
> Noble host. [1] However, also note that my containers are using ext4
> for the rootfs. Are you using ZFS? If so, this sounds similar to [2],
> but we uploaded a workaround in systemd-sysusers for Noble (and it's
> present in upstream >= v256) and I thought the kernel got fixed, too.

Thanks! A newer kernel is what I needed. IIUC, systemd 255.4-1ubuntu8 is
supposed to handle an older kernel with this issue though, and it
doesn't seem to? So I'm not sure if it's the same bug or not.

> > I've just heard that Oracular Raspi pre-install images have been broken
> > for a week for what appears to be the same reason.
>
> Is there a bug you can share? I have not seen details of this yet.

The failures are here:
https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/oracular/ubuntu-preinstalled

> > What do you think about kicking this systemd update back to
> > oracular-proposed until it is resolved properly, and/or uploading a
> > revert?
>
> I don't see sufficient evidence that this would help the situation.
> But then again, I am confused about the details of this bug on
> Oracular vs Jammy because your LXD issue is about Jammy, and I have
> not seen any details for the Oracular Raspi issue.

Sorry - I was looking at multiple lxd issues in the same week and I
conflated them. This one was for a Noble host running a Jammy container
and you're right to question that it has nothing to do with Oracular.

I was surprised to see the security.nesting=true workaround going in to
samba in LP: #2046486 though. That, together with developers having to
set security.nesting=true everywhere to continue with their work, does
still seem onerous. If this problem was introduced by a new systemd, why
wouldn't a systemd revert help the situation?

Robie

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel