Thursday 9 June 2022

systemd-oomd issues on desktop

Hi,

During the 22.04 cycle, we enabled systemd-oomd [1] by default on
desktop. Since then, there have been reports of systemd-oomd killing
user applications too frequently (e.g. browsers, IDEs, and gnome-shell
in some cases). In addition to a couple of LPs [2][3], I have heard
these reports by word-of-mouth, and there have been discussions on
internal Mattermost. A common theme in these reports is that e.g.
Chrome is killed "suddenly" without any other observable symptoms of
the system nearing OOM.

For more context, systemd-oomd basically has two methods for deciding
a unit's cgroup is a candidate for OOM kill:

1. When total system memory usage _and_ swap usage both exceed
SwapUsedLimit (90% by default, and on Ubuntu) [4], monitored cgoups
with greater than 5% swap usage become OOM kill candidates, and
cgroups with the highest swap usage are acted on first.

2. When a unit's cgroup memory pressure exceeds MemoryPressureLimit
[5] for at least MemoryPressureDurationSec [6], monitored descendant
cgroups will be acted on starting from the ones with the most reclaim
activity to the least reclaim activity.

In the reports I refer to above, applications are being killed due to
(1). In practice, the SwapUsedLimit might be too easy to reach on
Ubuntu, largely because Ubuntu provides just 1GB of swap. Since we
follow the suggestion of setting ManagedOOMSwap=kill on the root slice
[7], every cgroup is eligible for swap kill. When this condition is
met, user applications like browsers are going to be killed first.

While investigating [2], we patched upstream systemd-oomd to fix how
"used memory" was calculated, and we brought the patch into Jammy.
This may have helped the situation a bit, but it does not appear this
was enough to fix the issue entirely.

Given the current situation, I think we should re-consider how
systemd-oomd is configured on Ubuntu. These are the options that come
to mind:

1. Increase SwapUsedLimit (again, currently at 90%). I think this is
probably the safest change, but it is not clear to me how significant
of an impact this would have.

2. Set ManagedOOMSwap more selectively. Again, we currently follow the
recommendation of setting ManagedOOMSwap=kill on the root slice
(-.slice), so every descendant cgroup is a candidate for swap kill. It
_might_ be effective to say "do not swap kill cgroups descendant of
user's app.slice". The downsides of this approach would be that the
configuration does not scale well (i.e. a lot more configuration
needed to get the proper swap kill "coverage"), and this may just
place the problem onto a different class of processes.

3. Do not enable swap kill at all. This would mean systemd-oomd would
only act when memory pressure limits are reached. Given Ubuntu's swap
configuration, does it make sense for systemd-oomd to act on high swap
usage?

4. Increase swap on Ubuntu. I am adding this for completeness, but I
doubt this is a viable option.

I think that either option (1) or (3) would be the most reasonable --
maybe trying (1) first and falling back to (3) if necessary. If anyone
has an opinion on this, or can think of other options, I would
appreciate the input.

Thanks,

Nick 'enr0n' Rosbrook

[1] https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html
[2] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1966381
[3] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1972159
[4] https://www.freedesktop.org/software/systemd/man/oomd.conf.html#SwapUsedLimit=
[5] https://www.freedesktop.org/software/systemd/man/oomd.conf.html#DefaultMemoryPressureLimit=
[6] https://www.freedesktop.org/software/systemd/man/oomd.conf.html#DefaultMemoryPressureDurationSec=
[7] https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html#Usage%20Recommendations

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel