Bug #15044
closedENV encoding not UTF-8 by default
Description
$ irb
2.5.1 :001 > 'secret'.encoding
=> #<Encoding:UTF-8>
2.5.1 :002 > ENV['PASS'] = 'secret'; ENV['PASS'].encoding
=> #<Encoding:US-ASCII>
2.5.1 :009 > ENV['PASS'] = 'Ł'
=> "\u0141"
2.5.1 :010 > ENV['PASS'].encoding
=> #<Encoding:ASCII-8BIT>
I would expect all encodings to be UTF-8 at all times
Updated by shevegen (Robert A. Heiler) over 6 years ago
If I put this into a .rb file:
puts 'secret'.encoding
ENV['PASS'] = 'secret'
puts ENV['PASS'].encoding
On my system I get these two Strings output:
UTF-8
ISO-8859-1
My environment is, aka my current locale, iso-8859-1, so the results that
I get seem correct. I can change the UTF-8 default encoding if I use a
shebang line in the .rb file, which I normally do, so all my encodings are
the same (ISO-8859-1; regexes used to behave a bit oddly sometimes but I
am not sure if that has changed or not).
I think ENV behaves a litle bit differently upon an
assignment.
If I use a shebang line in a .rb file that includes the above unicode
character (this weird L), then all string encodings in that .rb file
are also ISO-8859-1, so I am not sure if there is any bug at all.
It may be more related to IRB perhaps? I skipped testing on IRB mostly
because .rb files have a "higher weight" than code put through IRB.
The documentation does not mention what happens with encodings when
these are assigned to an ENV key, though:
https://ruby-doc.org/core-2.5.1/ENV.html
Perhaps it has more to do with IRB, in which case it could be added
there:
http://ruby-doc.org/stdlib-2.5.1/libdoc/irb/rdoc/IRB.html
And of course it may be that there is indeed a bug. You can try to
test with a standalone .rb file though and, if necessary, with a
specific shebang comment.
Updated by mame (Yusuke Endoh) about 5 years ago
- Status changed from Open to Closed
It is intentional according to naruse. The encoding of ENV depends on the environment variable LANG.
Updated by naruse (Yui NARUSE) about 5 years ago
The assigned value to ENV
are stored in the process's environment variable.
The encoding of ENV[key]
is set as locale.
You can get the locale encoding by Encoding.find("locale")
which is decided based on Encoding.locale_charmap
which is affected by ENV["LANG"]
and ENV["LC_ALL"]
.
Note that ENV["PATH"]
is returned as filesystem encoding but it is the same as locale encoding on Unix.