Wednesday 3 April 2019

Maintaining language-specific module package stacks

I'd like to talk about addressing the difficulty in maintenance of long
tail language-specific stacks in Ubuntu. For example, right now
`src:rails` is stuck in disco-proposed[1]. It seems to me that we spend
a disproportionate amount of effort trying to get this class of package
migrated to the release pocket compared to the number of Ubuntu users
who actually care and use them.

I suggest that:

1. If a language-specific package, or stack of packages, is stuck in
proposed, and nobody is volunteering to get them migrated, then we are
more willing to delete them from the release pocket and release without
that stack.

2. We recommend, as a project, that users who wish to use these
language stacks directly do so via the language-specific packaging
tooling.

Background
----------

It'd be great to get input from language-specific communities who may
not be intimately familiar with distribution development process, so
here's a quick summary of what I'm talking about. Those already familiar
with distribution development workflow can skip to the next section.

You're presumably aware that language-specific communities generally
have their own package repositories and package managers, such as
PyPI/pip, RubyGems.org/gem, npm Registry/npm and so on. My understanding
is that these communities generally advise that users consume from these
repositories directly.

Debian often packages a subset of these repositories - usually for the
purpose of fulfilling the dependency requirement of some higher level
component, such as Rails in my example. Ubuntu then autosyncs these.
Some users prefer these higher level components to come through their
distribution, rather than via a language-specific package manager -
presumably for the release management consolidation that this provides.
However, it's my understanding that in practice the majority of users do
not consume the distribution packaging for these components, instead
using the language-specific package managers directly as generally
recommended by those upstreams. It also seems quite common for language
communities to specifically recommend against consuming language module
packages through the distribution.

In general distributions include only one version of each component in a
given distribution release. Bringing the entire dependency web in line
to make this possible can take considerable effort on the part of
distribution developers, particularly when upstreams tend to use a large
number of small dependencies and also require specific dependency
versions due to frequent API breaks.

In practice, the dependency web is more complicated than this, and
entire swathes of packages get held up at once until all the
dependencies can be resolved together; this takes work in figuring out
what the holdups are, making decisions about which versions to ship, and
possibly patching code to change what dependency versions are actually
required, in order to make everything work together. Think of it like
trying to come up with a single `requirements.txt` (as generated by `pip
freeze`), `Gemfile.lock` or `package-lock.json` file that works for all
packages across the entire distribution release.

Until this is resolved, distribution package updates are held in a
staging area, and won't be part of the next distribution release. In
Ubuntu we say that that our packages are "stuck in proposed". Once
resolved, the packages "migrate" to the "release pocket".

The details of this process in Ubuntu are documented here:
https://wiki.ubuntu.com/ProposedMigration

For example, we are blocked from shipping a newer version of the PHP
interpreter until the `wordpress` package (which is written in PHP)
works with the proposed newer PHP version. Now consider that there are
hundreds of these reverse dependency packages like `wordpress`,
including many PHP language modules. The addition of each one makes it a
little harder for Ubuntu to move on updating PHP itself. My suggestion
is that we more readily remove these reverse dependencies from the
distribution release to free up the update of PHP itself in this
example, rather that spend what seems like a disproportionate amount of
effort fixing things that we suspect very few users actually care about.

Discussion
----------

Part of the purpose of my suggestion is to make it far easier to
transition to new language interpreters without worrying too much about
the very long tail of barely used reverse dependency language modules
that usually hold up these transitions. My understanding is that far
more users rely on the distribution supplying the language interpreter
and language-specific package manager than the modules themselves. In
case of a transition being held up like this, I'm proposing to simply
permit the deletion of the long tail from the release pocket, get the
newer interpreter stack migrated into the release pocket, and consider
it done. The long tail will then migrate if maintained actively by
others, and if it isn't, we'll ship without it.

Right now for example, I'm suggesting that we simply delete `src:rails`
from the release pocket, including its reverse dependencies, unless the
reverse dependency list contains something outside the Rails stack that
is unacceptable to us to delete. Rails users who use `gem install
rails`, as recommended by Rails upstream[2], will not be affected.

My suggestion deliberately leaves the door open for able volunteers to
be able to maintain these packages in Ubuntu if they wish.

One downside to my suggestion is that availability of particular
language module packages may become unreliable between Ubuntu releases
from a user's perspective (if a particular package skips an Ubuntu
release before being restored, for example). I suggest that this can be
tackled later if it becomes a problem, for example by blacklisting such
packages for longer unless a team is prepared to commit to preventing
this from happening.

Another potential downside is in packages that are generally useful to
users outside their own language ecosystems, yet depend on these
language-specific dependency stacks. For example, take Vagrant. Vagrant
is written in Ruby, so needs packages originating from RubyGems.org. I'm
sure there are plenty of users who aren't deploying a Ruby-based stack
but who do use Vagrant. I think that a significant number of these users
probably prefer to consume Vagrant from the distribution package (`apt
install vagrant`) rather than from RubyGems.org (`gem install
vagrant`)[3]. Because distributions include all of their dependencies in
their own repositories, this means that to ship Vagrant as a package in
Ubuntu, we also need to ship a package for everything in RubyGems.org
that Vagrant requires. I suggest that we don't apply my deletion policy
to such packages and their dependencies - that we continue trying to
maintain them on a best effort basis as we do today.

If we do change anything in this regard, I think an important part of
this is for us to decide as a project what affected users can expect,
what we recommend that they do, and that we communicate this clearly. I
suggest that, in the absence of Ubuntu development teams volunteering to
maintain this class of packages in Ubuntu, we generally follow
upstream's recommendations in using their language-specific package
management stack rather than apt/dpkg. This isn't all that different
from the situation today, where users can't rely on us having packaged
the specific modules that they need at particular versions anyway.

We might want to be more specific in our recommendations to users. For
example, it generally is better for system stability, particularly when
installing software from third party sources, for third party software
to be confined well. Where such a system is available, we could
specifically recommend its use, and recommend against installing to the
"system". For example, with the use of virtualenv for Python stacks. I
note that Ruby is available from upstream as a snap, so if suitable for
general deployments that might be the gold standard for recommended
confinement; if not, then at least rvm.

Please discuss. I intend to draw this thread to the attention of
language-specific communities too, to try and get their input. In
particular it'd be great to agree on the same recommendations for Ubuntu
users. Note that this list is moderated; I propose to permit emails from
language-specific communities in response to this thread for a few weeks
to avoid fragmenting discussion between here and the unmoderated list.
I'll make sure replies get through moderation promptly. Don't worry
about not being subscribed: just replying to the list will be fine.

Thanks,

Robie


[1] http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html#rails

[2] https://guides.rubyonrails.org/getting_started.html#installing-rails

[3] I note that Vagrant upstream specifically advises against `apt
install vagrant` from distributions[4]. Contrary to my understanding of
typical _direct_ use of language-specific stacks with which I'm
justifying my suggestion, however, my understanding is that many of
those who use Vagrant but aren't Ruby-based developers prefer the
distribution package regardless. It makes sense to me that a
non-Ruby-developer wouldn't want to learn, use and maintain an entirely
different package manager or add an additional third party software
source just for one top level package that the distribution ships
anyway, unless they specifically really need a version newer than one
provided by the distribution release.

[4] https://www.vagrantup.com/docs/installation/