Monday 12 March 2018

Re: zstd compression for packages

On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
<daniel.axtens@canonical.com> wrote:
> Hi,
>
> I looked into compression algorithms a bit in a previous role, and to be
> honest I'm quite surprised to see zstd proposed for package storage. zstd,
> according to its own github repo, is "targeting real-time compression
> scenarios". It's not really designed to be run at its maximum compression
> level, it's designed to really quickly compress data coming off the wire -
> things like compressing log files being streamed to a central server, or I
> guess writing random data to btrfs where speed is absolutely an issue.
>
> Is speed of decompression a big user concern relative to file size? I admit
> that I am biased - as an Australian and with the crummy internet that my
> location entails, I'd save much more time if the file was 6% smaller and
> took 10% longer to decompress than the other way around.
>
> Did you consider Google's Brotli?
>

I can't speak for Julian's decision for zstd, but I can say that in
the RPM world, we picked zstd because we wanted a better gzip.
Compression and decompression times are rather long with xz, and the
ultra-high-efficiency from xz is not as necessary as it used to be,
with storage becoming much cheaper than it was nearly a decade ago
when most distributions switched to LZMA/XZ payloads.

zstd also provides the necessary properties to make it chunkable and
rsyncable, which is useful for metadata. For package payloads, there
are things we can do to make compression go much faster than it does
now (and it's still quite a bit faster than xz as-is and somewhat
faster than gzip now).

I don't know for sure if Debian packaging allows this, but for RPM, we
switch to xz payloads when the package is sufficiently large in which
the compression/decompression speed isn't really going to be matter
(e.g. game data). So while most packages may not necessarily be using
xz payloads, quite a few would. That said, we've been xz for all
packages for a few years now, and the main drag is the time it takes
to wrap everything up to make a package.

As for Google's Brotli, the average compression ratio isn't as high as
zstd, and is markedly slower. With these factors in mind, the obvious
choice was zstd.

(As an aside, rpm in sid/buster and bionic doesn't have zstd support
enabled... Is there something that can be done to make that happen?)

--
真実はいつも一つ!/ Always, there's only one truth!

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel