Bug #8822
closedIncorrect encoding for ENV in Windows
Description
When reading ENV
and if it contains non-ASCII - string won't have correct encoding.
In Ruby 2.0 we can force it to UTF8 (it doesn't matter what's windows encoding nor consoles) and it will be correct, but in Ruby 1.9 there's no way to correctly read it.
Writing non-ASCII string to ENV is not possible at all neither of versions.
Also Ruby1.9 fails to read ENV with name witch contains non-ASCII
Here's test.rb script (basically set environment variable outside of ruby and in ruby print it out)
Output on Ruby2.0 and on Ruby1.9
Seems it wasn't properly fixed in #5570
Updated by davispuh (Dāvis Mosāns) over 11 years ago
In Ruby 2.0 when assigning to ENV, seems it double encodes it.
But in Ruby 1.9 it's interesting that after assigning - it shows console's encoding, but that shouldn't be needed at all...
Updated by zzak (zzak _) over 11 years ago
- Status changed from Open to Assigned
- Assignee set to windows
Updated by usa (Usaku NAKAMURA) over 11 years ago
Since Ruby 1.8 assumes the encoding of ENV is locale (or -K specified encoding),
Ruby 1.9 also treats it as locale for compatibility.
It was intentional decision, not bug.
We were able to break compatibility at Ruby 2.0, but the work was not done.
BTW, to be sure, the present behavior of Ruby 2.0 is wrong.
It should be corrected.
Updated by nobu (Nobuyoshi Nakada) over 10 years ago
- Description updated (diff)
Updated by nobu (Nobuyoshi Nakada) about 9 years ago
- Status changed from Assigned to Closed
Applied in changeset r52896.
hash.c: env encoding fallback on Windows
- hash.c (env_str_new, env_path_str_new): make default string
UTF-8 for the case conversion is not possible. [Bug #8822] - hash.c (get_env_cstr): convert non-ASCII string to UTF-8 string.
- hash.c (ruby_setenv): use wide char version to put environment
variable to deal with non-ASCII value.
Updated by Iristyle (Ethan Brown) over 8 years ago
- Backport deleted (
1.9.3: UNKNOWN, 2.0.0: UNKNOWN)
I don't believe this is properly fixed.
I just left a comment at https://bugs.ruby-lang.org/issues/9715#note-5, and will leave the same comment here:
The expectation is that regardless of current locale / codepage, I should get UTF-8 strings when using ENV
on Windows. Here is a simple reproduction of the failure on 2.3.0
:
C:\Users\Administrator> $env:unicode = 'taskᚠᛇᚻ'
C:\Users\Administrator> dir Env:\unicode
Name Value
---- -----
unicode taskᚠᛇᚻ
C:\Users\Administrator> ruby --version
ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32]
C:\Users\Administrator> chcp
Active code page: 437
C:\Users\Administrator> irb
irb(main):001:0> RUBY_VERSION
=> "2.3.0"
irb(main):002:0> Encoding.default_internal
=> nil
irb(main):003:0> Encoding.default_external
=> #<Encoding:IBM437>
irb(main):004:0> str = ENV['unicode']
=> "task???"
irb(main):005:0> str.encoding
=> #<Encoding:IBM437>
Again, when I access ENV
on Windows, I should receive a UTF-8 string with the correct data, not a IBM437
string. The expected string in this case is:
irb(main):036:0> str2 = "task\u16A0\u16C7\u16BB"
=> "task\u16A0\u16C7\u16BB"
irb(main):037:0> str2.encoding
=> #<Encoding:UTF-8>
Note that some browsers, like Chrome on OSX, may fail to render the Rune characters correctly, but if you copy into a proper editor or use another browser you should see the characters fine.