Wednesday 17 March 2021

Re: General mechanism to supply "rich history" to git-ubuntu

On Tue, Feb 02, 2021 at 01:03:23PM +0000, Robie Basak wrote:
> Question: before I land this branch, I'd like to make sure that there
> aren't any issues with the "spec", as follows.
>
> For (developer) users, you'd run something like:
>
> dpkg-buildpackage $(git ubuntu push-for-upload) <your usual flags here>
>
> The push-for-upload[1] command would push your branch to Launchpad and
> also output some -Dfield=value arguments that would get passed through
> to dpkg-genchanges.
>
> Further/better wrappers could come later - especially for newcomers
> where I'd like to wrap away the dpkg-buildpackage stuff.
>
> Technically, the mechanism is:
>
> 1) The uploader pushes their commits somewhere.
>
> 2) The uploader includes a reference to the commits in the changes
> file.
>
> 3) The uploader dputs as normal.
>
> 4) When git-ubuntu sees the upload, it pulls the commits from the
> repository listed in the changes file.
>
> 5) If the commits pass sanity checks (eg. the final commit matches the
> upload exactly), then it uses the commits provided instead of
> synthesizing its own.
>
> What goes into the changes file is three fields. Example:
> https://launchpadlibrarian.net/516799033/hello_2.10-2ubuntu3~ppa1_source.changes
>
> Vcs-Git: https://git.launchpad.net/~racb/ubuntu/+source/hello
> Vcs-Git-Commit: 4511fdfc01cbfd5bc351e1da294d6acb44e8a4a2
> Vcs-Git-Refs: refs/heads/test
>
> We need the Refs field because git is designed not to be able to fetch a
> commit by hash, but by a ref that can reach it only. So Vcs-Git-Refs
> must specify what ref(s), when fetched will make the commit given to be
> reachable. In practice this could just be the branch name prefixed by
> 'refs/heads/' as in this example.

I don't see why Vcs-Git-Refs would ever need to be plural: if a commit
isn't reachable from any single ref, then adding another ref will never
be helpful. This should just be Vcs-Git-Ref, singular, which also
simplifies code that uses it (since it will in practice always be a
single ref, any code to handle more than one ref there would be
poorly-tested).

The spec should probably also say that the commit in question isn't
guaranteed to be reachable from that ref in the long term, but only by
the git-ubuntu importer in the short term (however exactly you define
that). Since the repository may be owned by the uploader, it doesn't
really seem practical to impose stronger lifetime constraints.

What happens if the repository is temporarily unreachable? Presumably
you back off and retry later, but that does mean you need a reasonably
robust way to distinguish temporary failures from permanent ones.

What happens if the repository is renamed before git-ubuntu gets to it?
For example, the uploader might choose to change their Launchpad
username, which would invalidate the original repository URL. (In
practice this will be rare, partly because renames are fairly rare to
start with, and partly because users with PPAs currently can't change
their username due to technical constraints in Launchpad; but on
principle I believe that users should generally be free to change their
usernames with minimal bureaucracy and I would like to avoid adding
further technical constraints that we need to solve before allowing them
to do so.)

What happens if the repository is private, as might be the case if the
upload is a security upload whose contents are embargoed before it hits
the archive?


dgit avoids all these problems by having a push model rather than a pull
model: "dgit push-source" pushes the appropriate commit to a specialized
git server, which can then make sure not to lose it. Now, at present in
Debian this has the side-effect of restricting the set of people who can
push, and it's certainly not entirely obvious how we would go about such
a thing in Ubuntu with Launchpad (I think we should avoid any design
that requires setting up another git server that needs to know about
developer identities and permissions etc.), but I think it's at least
worth thinking about before committing ourselves to a pull model.

What would we need in Launchpad if we were going to try to do this on a
push basis? Brainstorming a bit, these are some approaches that came to
mind, bearing in mind that some of these ideas may be terrible:

* Might it be practical to tell Launchpad to reserve some kind of token
corresponding to the commit in question guaranteeing that that commit
would be reachable until the token is consumed, which git-ubuntu
could then pass in the .changes file and the importer could consume?

* Perhaps the upload could include a git bundle relative to some other
version already in the archive? (This could be large, though.)

* Could we work out a way to allow any contributor to push to some kind
of holding area associated with the importer-owned repository, and
then the importer would only point a ref at that once the upload has
been processed? (I'm not sure how we would prevent an attacker from
being able to force such a repository to grow without bound, though.)

* Could we use merge proposals for this somehow? An upload is in some
sense a proposal to merge some changes into the primary archive, and
I know "git ubuntu submit" already integrates with merge proposals on
an experimental basis. That might allow Launchpad to know that a
given commit is interesting and should be made available - indeed we
already have plans to expose virtual refs that correspond to merge
proposals, although I don't think those are quite done yet. If we
were to take this approach, then the ref that we point to could be
made to appear in the target repository instead, which avoids the
collection of issues around the source repository disappearing or
moving.

Of these, albeit with only half an hour's thought, I think my favourite
is the last one: using merge proposals feels quite elegant, and is
perhaps only a change in how your spec would be used rather than a
format-level change. I may have missed something, though. What do you
think?

> Notably there's a Dgit field defined by Debian Policy against dsc files,
> which is used for a very similar purpose[2].

I'm only a dgit user rather than an expert in its implementation, but I
believe that the Dgit field is used by dgit to retrospectively work out
the commit that represents a given source package version in the archive
as part of preparing a newer version that ought to be a descendant of
that commit, rather than as part of an upload instruction. That's why
it lives in the .dsc file rather than the .changes: after an upload has
been processed, the .changes is not stored in any authenticatable way.
But it's true that the current specification of Dgit explicitly relies
on the repository being at a well-known and persistent location.

--
Colin Watson (he/him) [cjwatson@ubuntu.com]

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel