Project

General

Profile

Actions

Bug #18495

closed

`LC_ALL=C.UTF-8` sets `Encoding.default_external` to `Encoding::US_ASCII`

Added by byroot (Jean Boussier) over 2 years ago. Updated over 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-darwin21]
[ruby-core:107158]

Description

Original bug report on Bootsnap: https://github.com/Shopify/bootsnap/issues/395#issuecomment-1014421271

$ env LC_ALL=en_US.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:UTF-8>
$ env LC_ALL=C.UTF-8 ruby -e 'p Encoding.default_external'
#<Encoding:US-ASCII>

I'm not particularly familiar with LC_ALL, but from what I gathered online, C.UTF-8 is supposed to mean "no internationalization, but UTF-8 support".

Updated by byroot (Jean Boussier) over 2 years ago

So I dug into the code a bit, and Ruby seem to delegate most of that to the system with nl_langinfo

And reading more into it, it seems that C.UTF-8 while common, isn't POSIX, and my system seem to behave the same:

$ env LC_ALL=C.UTF-8 locale charmap
US-ASCII

Updated by byroot (Jean Boussier) over 2 years ago

  • Status changed from Open to Closed

Closing since I now believe it's up to the system to properly define locales. Apologies for the noise.

Updated by Eregon (Benoit Daloze) over 2 years ago

This sounds like a bug of the operating system.

On Fedora 33:

$ env LC_ALL=C.UTF-8 locale
LANG=en_US.UTF-8
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
$ env LC_ALL=C.UTF-8 locale charmap
UTF-8

On debian:buster-slim in Docker (podman actually):

# env LC_ALL=C.UTF-8 locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
# env LC_ALL=C.UTF-8 locale charmap
UTF-8

On Ubuntu 20.04:

# env LC_ALL=C.UTF-8 locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8
# env LC_ALL=C.UTF-8 locale charmap
UTF-8

Which seems much more sensible.

Maybe the C.UTF-8 "locale" is not generated on the system you tested?

BTW I noticed C.UTF-8 is available in Debian & Ubuntu in Docker, but en_US.UTF-8 is not by default, it warns with:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

FWIW TruffleRuby has some docs on how to properly set a en_US.UTF-8 locale on various OS: https://github.com/oracle/truffleruby/blob/master/doc/user/utf8-locale.md (seems one of the most frequent issues when using Docker)

Updated by byroot (Jean Boussier) over 2 years ago

This sounds like a bug of the operating system.

Yes that's what I figured after digging into the code.

Maybe the C.UTF-8 "locale" is not generated on the system you tested?

Indeed, it doesn't exist on OSX as far as I can tell, and likely wasn't present on the initial reporter system (Fedora, but maybe more stripped down than yours?).

Updated by Eregon (Benoit Daloze) over 2 years ago

byroot (Jean Boussier) wrote in #note-4:

Fedora, but maybe more stripped down than yours?

Probably a much older (and unsupported by upstream) Fedora then.
https://bugzilla.redhat.com/show_bug.cgi?id=902094 says it was backported to Fedora 22 & 23, which are pretty old.
It works at least on fedora:28 and fedora:33 docker images (which I have locally).

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0