On Mon, Jun 03, 2024 at 08:49:04PM -0700, Steve Langasek wrote:
> On Fri, May 03, 2024 at 08:43:11AM +0200, Heinrich Schuchardt wrote:
> > On Debian I have seen apt-update downloading diff files. Why don't we use
> > those for Ubuntu
>
> Disclaimer: I have never benchmarked this, but am going by my observations
> from 20 years ago when this was adopted by Debian.
>
> In the common case, pdiff files decrease the *data transfer size* for
> downloading indices, but increase the *clock time* it takes to update the
> apt database. Therefore there has never been evidence that it's a net win
> given modern Internet connections, and no one has ever agitated for this to
> be a priority in Ubuntu/Launchpad.
It's much faster these days. Debian also moved to server-side merged
pdiffs now, so it's just one file to download and apply, whereas before
multiple files were downloaded in the pipeline and then merged locally
before being applied.
And I guess in the very old days it was not merged at any point.
But in any case, this has been brought up a couple of times and
the main issue why we don't have pdiffs for Ubuntu is arguably
that we don't have like 4 dinstall runs a day, but more like 72
launchpad publisher runs, and having 18x as many deltas is quite
expensive both in storage space and calculating them.
Which is why a zsync-style synchronization would be better where
we compress the file block wise and can fetch changed blocks; however
the APT HTTP code is fairly unreliable and that's fairly complex,
and you need to use something like zstd with a custom dictionary
to make this still compress efficiently.
An alternative would be replacing the pdiff format with a tagged
one, where each hunk is associated with a timestamp, such that you
can then ship one larger patch file per day or so, and you only
apply the bits you actually need.
Essentially you just need to keep the previous run's pdiff2 around
to keep pdiffs working on upgrades, so it will be 2 diffs for the
current day, and 1 for each past day or so.
I don't believe there is much value in this for stable releases
though as has been pointed out before. This may change with LTS.
But also we could come up with a much simpler approach where we
just split the -updates/-security pockets into 6 months cycles
(i.e. add 2024H1 or 24.04.1 subdirectories, add some fancy way
to InRelease files to state you need to fetch these subdirectories
too) and then you just fetch the current cycle on update. But
first fetches would be annoying.
But still not convinced that is worth the effort.
--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en