>
> On Tue, Jul 11, 2023, Michael Hudson-Doyle wrote:
> > I was wondering if it make sense to construct a zstd dictionary for
> > compressing kernel modules but I didn't realize they need to be available
> > at decompression time, I'm not sure the kernel would support that.
>
> I've tried to use the zstd --train and I don't think it is appropriate
> here, or at least not without significant preparation of datasets. It's
> more suited for many many small and similar files. We don't have that
> many files and there are only clusters of similarities, not similarities
> shared across every file. Moreover, since our files are big (at least
> the one that matter for the overall size), they're already contributing
> a lot to the live dictionary and the pre-built one has little overall
> influence.
Also I don't think it will work for us - the train command produces
dictionary that needs to be present for both compression and
decompression, and right now the kernel doesn't have support for that
or more specifically to load one from userspace (potentially has
security implications). We could in theory generate training files,
generate dictionaries, bake it into our kernel, but then it would make
all of our compressed things non-portable or more difficult to update
dictionary, and the .zst file itself becomes non-portable as it now
depends on an external dictionary. Dictionary make sense for lots of
similar things (lots of status icons, or similar gaming images) and
then one can take all assets, train on them, and do a static build
that uses said perfect dictionary against a perfect set of fixed
assets. It sort of like the more generalised usecase of perfect-hash
functions (gperf) but for data files.
--
okurrr,
Dimitri
--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel