Friday 31 May 2024

Re: Make proposed available by default? [was: Setting NotAutomatic for hirsute+1-proposed]

On Fri, May 03, 2024 at 06:23:05PM +1200, Michael Hudson-Doyle wrote:
> If we want to make apt update quicker / lighter on resources we should
> figure out if we can stop publishing some of the hashes (which entirely
> dominate the size of the compressed package lists). We currently have 4
> hashes in the lists (md5, sha1, sha256, sha512) -- I know Dimitri was
> trying to get us to the point that we could stop publishing MD5 at least
> but there are a few things out there that hardcode a dependence on it.
> Maybe oracular is a good time to turn off some hashes and see what breaks.

I did some further analysis.

Summary of results:

Adding proposed increases download size by 7% at worst. Stripping older
hashes reduces download size by around 35% to 40%. With stripping,
adding proposed would make a difference to download size of 4% at worst,
and overall we'd still have a >30% improvement.

Note however that when I say that there's an increase in download size
of 7% at worst, that's 2.4 MB. IMHO, that's negligible. The most we
could get in savings by stripping hashes is 15 MB, and that's assuming
no previous/ongoing cache.

Analysis

I suggest therefore that we don't need to worry about size from the
perspective of adding proposed. Stripping hashes will provide some
worthwhile benefit but I don't think we need to block adding proposed on
this. I've filed LP: #2067752 to track the removing of the old hashes.

Detailed results:

Noble

Download

| Without proposed | 29.2 MB |
| With proposed | 29.9 MB |
| Difference | 0.7 MB / 102% |

Considering just the Packages files from the download:

| | Not stripped | Stripped | Difference |
| Without proposed | 16431k | 11035k | 5396k / 67% |
| With proposed | 16880k | 11199k | 5681k / 66% |
| Difference | 449k / 103% | 164k / 101% | 5232k / 68% |

Jammy

Download

| Without proposed | 33.9 MB |
| With proposed | 36.3 MB |
| Difference | 2.4 MB / 107% |

Considering just the Packages files from the download:

| | Not stripped | Stripped | Difference |
| Without proposed | 23661k | 15229k | 8432k / 64% |
| With proposed | 25135k | 15805k | 9330k / 63% |
| Difference | 1474k / 106% | 576k / 104% | 7856k / 67% |

Focal

| Without proposed | 33.2 MB |
| With proposed | 34.5 MB |
| Difference | 1.3 MB / 104% |

Considering just the Packages files from the download:

| | Not stripped | Stripped | Difference |
| Without proposed | 23385k | 13256k | 10129k / 57% |
| With proposed | 24214k | 13756k | 10458k / 57% |
| Difference | 829k / 104% | 500k / 104% | 9629k / 59% |

Notes:

To compare like for like, I used `xz -9` from each corresponding series
both for the stripped estimate, and recompressed using that xz for the
not stripped estimate. In practice, Launchpad would presumably use a
newer-ish xz across all series.

Method:

Using lxd ubuntu:<series> container images
find /var/lib/apt/lists /var/cache/apt -type f -delete
apt-get update # note how much it says it downloaded, eg. "Fetched 33.9 MB in 5s (6519 kB/s)"
Add proposed (`add-apt-repository -p proposed` or edit deb822 by hand on Noble due to LP: #2061128 and also manually on Focal)
find /var/lib/apt/lists /var/cache/apt -type f -delete
apt-get update # note how much it says it downloaded, eg. "Fetched 33.9 MB in 5s (6519 kB/s)"
apt-get install -y dctrl-tools
mkdir {un,}stripped
cp /var/lib/apt/lists/*Packages unstripped
cd unstripped
for i in *; do grep-dctrl -I -s MD5sum,SHA1,SHA256 . < $i > ../stripped/$i;done
xz -9 *
find -type f|xargs du -c # record "unstripped" "with proposed" sizes
find -type f|grep -v proposed|xargs du -c # record "unstripped" "without proposed" sizes
cd ../stripped
xz -9 *
find -type f|xargs du -c # record "stripped" "with proposed" sizes
find -type f|grep -v proposed|xargs du -c # record "stripped" "without proposed" sizes