Thursday 6 June 2013

Re: Systems with invalid locales

Robie Basak [2013-06-06 12:42 +0100]:
> I'd like to distinguish three different use cases of locales.
>
> 1) The locale that a sysadmin sees for commands that he types under
> his own login (whether local or over ssh). I'll call this the user
> locale.

This actually needs to be split in two: (1host) the locale as it is
configured on your host (from where you run ssh), and (1remote) the
locale that your user has configured on the remote host (where you ssh
into). These are usually defined in the respective ~/.profile or
~/.pam_profile. Let's

> 2) The locale that system services run under (eg. logging). I'll call
> this the system locale.

... which is usually specified in /etc/default/locale, or on older
systems, /etc/environment.

> 3) The locale served to users accessing services that the server
> provides. Example: what a user using a web application gets as a
> collation order when he sorts some listing by name. I'll call this the
> service locale.

My feeling is that this will usually be equal to the system locale, or
you have to configure the particular process for that specific locale,
overriding /etc/default/locale and the environment.

Note that due to the default behaviour of our ssh, (1host) becomes
(1remote) iff the remote system has no (2) or (1remote) configured
(that's the precise case I'm picking on, as this is just broken), and
due to our default behaviour of sudo, (1remote) even becomes (2). I.
e. due to these two, these concepts can easily become mixed up.

> Example:
>
> I (English) work with a French sysadmin on a server which
> provides services to Polish customers. I'd want all three locales
> configured on my server. We might settle on C or en_GB for the system
> locale. My French co-worker may use LANG=fr(?) and expects to see
> messages in French when he uses ssh to diagnose issues.

You said the server has en_GB.UTF-8 in /etc/default/locale. In that
case, ssh'ing in would give your French sysadmin that if he didn't
configure anything in his remote ~/.profile, or he puts
LANG=fr_FR.UTF-8 into his remote profile.

(1host) only comes into play if the remote server would not have an
/etc/default/locale, then ssh would transfer his (1host) locale
(presumably fr_FR.UTF-8) to (1remote), but that wouldn't actually work
unless the remote server actually has fr_FR.UTF-8 available (which
wasn't the case in said bug reports).

> I need to see English, since I don't understand French well enough.
> The server runs a web application which uses Postgres, so Postgres
> should use a Polish collation order.

PostgreSQL can define the collation and locales per-database and thus
individually per webapp, but for the sake of argument let's pretend it
can't, and it would always use the locale of template1 (i. e. the
default database created at installation).

> Problem 1: does postgresql-9.1's postinst do a different thing depending
> on whether me or my colleague installs it? Why? Now that we have
> /etc/default/locale, wouldn't it be better to use this?

If you have a (2) set in /etc/default/locale, then it will use that.
ssh maintains the right fallback here: (1remote) wins over (2) wins
over (1host). (And again, I claim that the latter is bad behaviour).

If you neither have (1remote) nor (2), then yes, it will depend on
what your host locale is due to how ssh and sudo combined behave, and
either fail because (1host) doesn't exist on the server, or create the
default db with just (1host).

But in that case there is nothing else that PostgreSQL could use,
except perhaps for saying that "if I have a locale defined by the
environment, but it's not in /etc/default/locale, /etc/environment,
/etc/postgresql/version/cluster/environment, ~/.profile,
~/.pam_profile, or whereever else you could set environment variables,
then use C", but that would make things even less predictable and
robust IMHO.

> Problem 2: but I wanted Postgres configured with a Polish collation
> order, which doesn't happen either way. Maybe the postinst should use
> debconf to ask the user, defaulting to what /etc/default/locale says?

That would be overkill IMHO. Except in the pathological case above, it
works just fine, and then you often want different locales for
different DBs/applications anyway. You can create more clusters or
databases, and in each case (pg_createcluster/createdb) specify an
individual locale combination.

> > I have always considered this default behaviour of ssh unexpected and
> > wrong. It blindly applies the host locale to the remote ssh session
> > without any checks whether that locale is actually valid. In
> > particular because it only seems to do that if the remote server does
> > not have any default locale from /etc/default/locale,
> > .pam_environment, or otherwise, which usually only occurs in servers
> > where locales have not been installed and configured at all (this
> > might be the case in our cloud instances, something we ought to fix).
>
> I hope I've presented the case for passing the locale setting through in
> my use case above. Two different sysadmins want two different locales
> available when they log in. How should we cater for this?

There can be just one (2), so they need to fight out which of the two
should become it. The other can then set a different locale in their
~/.profile. This is independent of ssh's behaviour, but of course the
sudo behaviour still applies here: if the other sysadmin uses sudo,
all operations will be done under his locale.

> > So in this situation it is very likely that the locale that ssh passes
> > from the host to the remote shell will not work.
>
> IMHO, we should make it work, or drop to C if a locale isn't configured.

My preferred fix would be to ssh to simply stop passing (1host) at
least if the remote system does not have that available. The very
first time you press <tab> in the remote shell you'll get screen
clutter by error messages, apt/dpkg will spit out a plethora of errors
from perl, and the locale will not work and behave like C anyway.

It's somewhat debatable if we want to pass (1host) if it is available
on the server; I'd personally prefer not to because of our sudo
behaviour, but I have no strong opionion about it, and good arguments
can be made either way.

> > I don't understand this -- if you run your entire server without
> > locales, then a lot of stuff will not work; e. g. your cited
>
> No, perhaps I haven't been clear. I want multiple locales configured, so
> that on a multi-user system users see their own locales when they shell
> in.

That's fine. But in the bug you cited the problem was that none of the
locales involved were configured on the server.

> But because we don't know what these locales might be, we can't easily
> configure them in advance. So a warning of an unconfigured locale is
> great, but that doesn't stop it being broken. That's why I'm proposing
> dropping the user down to C until the misconfiguration is fixed.

As I said in my reply to Steve, I'm happy to make the package
installation succeed, but not create any cluster in that case (and
print a warning instead, which might or might not be actually seen).
That's somewhat cleaner from a package installation POV, but again
leads to bug reports like "I installed postgresql and did not get a
default DB", which are rather hard to debug.

As for predicting which locales you might need on a server: If we keep
the current behaviour of ssh, then you could just generate _all_
locales on the server to be prepared for anything?

> Alternatively, if we conclude that locales shouldn't be passed through
> ssh, and that dropping my multi-locale multi-user use case is fine

Those are not the same thing as far as I can see. Working with
multiple locales on the server is fine; what's not fine is to throw an
arbitrary LANG= setting into a remote shell and expecting it to work
without checking whether it's valid.

> then sticking to the system locale (/etc/default/locale or
> whatever) would be fine for the original bug that prompted this
> thread.

That's what is already happening, as far as I can see.

> > I'd think that on a server you ought to set the system locale to
> > what you actually want, and then have your services use that, not
> > some random locale from outside that someone happens to use on
> > their workstation?
>
> Right - but the locale the user installing Postgres on is using isn't
> necessarily the system locale. So how about using /etc/default/locale
> instead of the environment-defined one?

Again, the referenced bug occurs when there is no /etc/default/locale.
Canonically, locales are defined by environment variables, and
/etc/default/locale, /etc/environment,
/etc/postgresql/version/cluster/environment, /root/.profile, and what
not are means to set them. We can probably add some heuristics there,
but if we limit it to only /etc/default/locale, we'd break other cases
which are legitimately working right now.

Thanks,

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

--
ubuntu-devel mailing list
ubuntu-devel@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel