Bug #8822


Incorrect encoding for ENV in Windows

Added by davispuh (Dāvis Mosāns) over 9 years ago. Updated almost 7 years ago.

Target version:
ruby -v:
ruby 2.0.0p247 (2013-06-27) [x64-mingw32]


When reading ENV and if it contains non-ASCII - string won't have correct encoding.
In Ruby 2.0 we can force it to UTF8 (it doesn't matter what's windows encoding nor consoles) and it will be correct, but in Ruby 1.9 there's no way to correctly read it.

Writing non-ASCII string to ENV is not possible at all neither of versions.

Also Ruby1.9 fails to read ENV with name witch contains non-ASCII

Here's test.rb script (basically set environment variable outside of ruby and in ruby print it out)

Output on Ruby2.0 and on Ruby1.9

Seems it wasn't properly fixed in #5570

Updated by davispuh (Dāvis Mosāns) over 9 years ago

In Ruby 2.0 when assigning to ENV, seems it double encodes it.
But in Ruby 1.9 it's interesting that after assigning - it shows console's encoding, but that shouldn't be needed at all...

Updated by zzak (Zak Scott) over 9 years ago

  • Status changed from Open to Assigned
  • Assignee set to windows

Updated by usa (Usaku NAKAMURA) over 9 years ago

Since Ruby 1.8 assumes the encoding of ENV is locale (or -K specified encoding),
Ruby 1.9 also treats it as locale for compatibility.
It was intentional decision, not bug.
We were able to break compatibility at Ruby 2.0, but the work was not done.

BTW, to be sure, the present behavior of Ruby 2.0 is wrong.
It should be corrected.

Updated by nobu (Nobuyoshi Nakada) almost 9 years ago

  • Description updated (diff)
Actions #5

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Status changed from Assigned to Closed

Applied in changeset r52896.

hash.c: env encoding fallback on Windows

  • hash.c (env_str_new, env_path_str_new): make default string
    UTF-8 for the case conversion is not possible. [Bug #8822]
  • hash.c (get_env_cstr): convert non-ASCII string to UTF-8 string.
  • hash.c (ruby_setenv): use wide char version to put environment
    variable to deal with non-ASCII value.

Updated by Iristyle (Ethan Brown) almost 7 years ago

  • Backport deleted (1.9.3: UNKNOWN, 2.0.0: UNKNOWN)

I don't believe this is properly fixed.

I just left a comment at, and will leave the same comment here:

The expectation is that regardless of current locale / codepage, I should get UTF-8 strings when using ENV on Windows. Here is a simple reproduction of the failure on 2.3.0:

C:\Users\Administrator> $env:unicode = 'taskᚠᛇᚻ'
C:\Users\Administrator> dir Env:\unicode

Name                           Value
----                           -----
unicode                        taskᚠᛇᚻ

C:\Users\Administrator> ruby --version
ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32]
C:\Users\Administrator> chcp
Active code page: 437

C:\Users\Administrator> irb
irb(main):001:0> RUBY_VERSION
=> "2.3.0"
irb(main):002:0> Encoding.default_internal
=> nil
irb(main):003:0> Encoding.default_external
=> #<Encoding:IBM437>
irb(main):004:0> str = ENV['unicode']
=> "task???"
irb(main):005:0> str.encoding
=> #<Encoding:IBM437>

Again, when I access ENV on Windows, I should receive a UTF-8 string with the correct data, not a IBM437 string. The expected string in this case is:

irb(main):036:0> str2 = "task\u16A0\u16C7\u16BB"
=> "task\u16A0\u16C7\u16BB"
irb(main):037:0> str2.encoding
=> #<Encoding:UTF-8>

Note that some browsers, like Chrome on OSX, may fail to render the Rune characters correctly, but if you copy into a proper editor or use another browser you should see the characters fine.


Also available in: Atom PDF