Monday 12 March 2018

Re: zstd compression for packages

On Mon, Mar 12, 2018 at 10:09 AM, Julian Andres Klode
<julian.klode@canonical.com> wrote:
> On Mon, Mar 12, 2018 at 09:30:16AM -0400, Neal Gompa wrote:
>> On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
>> <daniel.axtens@canonical.com> wrote:
>> > Hi,
>> >
>> > I looked into compression algorithms a bit in a previous role, and to be
>> > honest I'm quite surprised to see zstd proposed for package storage. zstd,
>> > according to its own github repo, is "targeting real-time compression
>> > scenarios". It's not really designed to be run at its maximum compression
>> > level, it's designed to really quickly compress data coming off the wire -
>> > things like compressing log files being streamed to a central server, or I
>> > guess writing random data to btrfs where speed is absolutely an issue.
>> >
>> > Is speed of decompression a big user concern relative to file size? I admit
>> > that I am biased - as an Australian and with the crummy internet that my
>> > location entails, I'd save much more time if the file was 6% smaller and
>> > took 10% longer to decompress than the other way around.
>> >
>> > Did you consider Google's Brotli?
>> >
>>
>> I can't speak for Julian's decision for zstd, but I can say that in
>> the RPM world, we picked zstd because we wanted a better gzip.
>> Compression and decompression times are rather long with xz, and the
>> ultra-high-efficiency from xz is not as necessary as it used to be,
>> with storage becoming much cheaper than it was nearly a decade ago
>> when most distributions switched to LZMA/XZ payloads.
>
> I want zstd -19 as an xz replacement due to higher decompression speed,
> and it also requires about 1/3 less memory when compressing which should
> be nice for _huge_ packages.
>

On a pure space efficiency basis, zstd -19 is still not as good as xz
-9, but it's pretty darned good.

>> I don't know for sure if Debian packaging allows this, but for RPM, we
>> switch to xz payloads when the package is sufficiently large in which
>> the compression/decompression speed isn't really going to be matter
>> (e.g. game data). So while most packages may not necessarily be using
>> xz payloads, quite a few would. That said, we've been xz for all
>> packages for a few years now, and the main drag is the time it takes
>> to wrap everything up to make a package.
>
> We could. But I don't think it matters much.
>

Maybe not. It was useful a long time ago, now we don't really care
either, as we use xz across the board (for the moment).

>>
>> As for Google's Brotli, the average compression ratio isn't as high as
>> zstd, and is markedly slower. With these factors in mind, the obvious
>> choice was zstd.
>>
>> (As an aside, rpm in sid/buster and bionic doesn't have zstd support
>> enabled... Is there something that can be done to make that happen?)
>
> I'd open a wishlist bug in the Debian bug tracker if I were you. If
> we were to introduce a delta, we'd have to maintain it...
>

Hence asking about sid/buster and bionic. :)

My previous experience with debbugs is that it's a black hole. We'll
see if it's better this time.

--
真実はいつも一つ!/ Always, there's only one truth!

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel