Bug #18495
closed`LC_ALL=C.UTF-8` sets `Encoding.default_external` to `Encoding::US_ASCII`
Description
Original bug report on Bootsnap: https://github.com/Shopify/bootsnap/issues/395#issuecomment-1014421271
$ env LC_ALL=en_US.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:UTF-8>
$ env LC_ALL=C.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:US-ASCII>
I'm not particularly familiar with LC_ALL
, but from what I gathered online, C.UTF-8
is supposed to mean "no internationalization, but UTF-8 support".
Updated by byroot (Jean Boussier) over 2 years ago
So I dug into the code a bit, and Ruby seem to delegate most of that to the system with nl_langinfo
And reading more into it, it seems that C.UTF-8
while common, isn't POSIX, and my system seem to behave the same:
$ env LC_ALL=C.UTF-8 locale charmap
US-ASCII
Updated by byroot (Jean Boussier) over 2 years ago
- Status changed from Open to Closed
Closing since I now believe it's up to the system to properly define locales. Apologies for the noise.
Updated by Eregon (Benoit Daloze) over 2 years ago
This sounds like a bug of the operating system.
On Fedora 33:
$ env LC_ALL=C.UTF-8 locale
LANG=en_US.UTF-8
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
$ env LC_ALL=C.UTF-8 locale charmap
UTF-8
On debian:buster-slim
in Docker (podman actually):
# env LC_ALL=C.UTF-8 locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
# env LC_ALL=C.UTF-8 locale charmap
UTF-8
On Ubuntu 20.04:
# env LC_ALL=C.UTF-8 locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
# env LC_ALL=C.UTF-8 locale charmap
UTF-8
Which seems much more sensible.
Maybe the C.UTF-8 "locale" is not generated on the system you tested?
BTW I noticed C.UTF-8 is available in Debian & Ubuntu in Docker, but en_US.UTF-8
is not by default, it warns with:
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
FWIW TruffleRuby has some docs on how to properly set a en_US.UTF-8
locale on various OS: https://github.com/oracle/truffleruby/blob/master/doc/user/utf8-locale.md (seems one of the most frequent issues when using Docker)
Updated by byroot (Jean Boussier) over 2 years ago
This sounds like a bug of the operating system.
Yes that's what I figured after digging into the code.
Maybe the C.UTF-8 "locale" is not generated on the system you tested?
Indeed, it doesn't exist on OSX as far as I can tell, and likely wasn't present on the initial reporter system (Fedora, but maybe more stripped down than yours?).
Updated by Eregon (Benoit Daloze) over 2 years ago
byroot (Jean Boussier) wrote in #note-4:
Fedora, but maybe more stripped down than yours?
Probably a much older (and unsupported by upstream) Fedora then.
https://bugzilla.redhat.com/show_bug.cgi?id=902094 says it was backported to Fedora 22 & 23, which are pretty old.
It works at least on fedora:28 and fedora:33 docker images (which I have locally).