Project

General

Profile

Bug #8822

Incorrect encoding for ENV in Windows

Added by Dāvis Mosāns over 3 years ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
cruby-windows
ruby -v:
ruby 2.0.0p247 (2013-06-27) [x64-mingw32]
Backport:
[ruby-core:56822]

Description

When reading ENV and if it contains non-ASCII - string won't have correct encoding.
In Ruby 2.0 we can force it to UTF8 (it doesn't matter what's windows encoding nor consoles) and it will be correct, but in Ruby 1.9 there's no way to correctly read it.

Writing non-ASCII string to ENV is not possible at all neither of versions.

Also Ruby1.9 fails to read ENV with name witch contains non-ASCII

Here's test.rb script (basically set environment variable outside of ruby and in ruby print it out)

Output on Ruby2.0 and on Ruby1.9

Seems it wasn't properly fixed in #5570

Associated revisions

Revision 52896
Added by Nobuyoshi Nakada 12 months ago

hash.c: env encoding fallback on Windows

  • hash.c (env_str_new, env_path_str_new): make default string UTF-8 for the case conversion is not possible. [Bug #8822]
  • hash.c (get_env_cstr): convert non-ASCII string to UTF-8 string.
  • hash.c (ruby_setenv): use wide char version to put environment variable to deal with non-ASCII value.

Revision 52896
Added by Nobuyoshi Nakada 12 months ago

hash.c: env encoding fallback on Windows

  • hash.c (env_str_new, env_path_str_new): make default string UTF-8 for the case conversion is not possible. [Bug #8822]
  • hash.c (get_env_cstr): convert non-ASCII string to UTF-8 string.
  • hash.c (ruby_setenv): use wide char version to put environment variable to deal with non-ASCII value.

History

#1 [ruby-core:56823] Updated by Dāvis Mosāns over 3 years ago

In Ruby 2.0 when assigning to ENV, seems it double encodes it.
But in Ruby 1.9 it's interesting that after assigning - it shows console's encoding, but that shouldn't be needed at all...

#2 [ruby-core:56976] Updated by Zachary Scott about 3 years ago

  • Status changed from Open to Assigned
  • Assignee set to cruby-windows

#3 [ruby-core:56984] Updated by Usaku NAKAMURA about 3 years ago

Since Ruby 1.8 assumes the encoding of ENV is locale (or -K specified encoding),
Ruby 1.9 also treats it as locale for compatibility.
It was intentional decision, not bug.
We were able to break compatibility at Ruby 2.0, but the work was not done.

BTW, to be sure, the present behavior of Ruby 2.0 is wrong.
It should be corrected.

#4 [ruby-core:63320] Updated by Nobuyoshi Nakada over 2 years ago

  • Description updated (diff)

#5 Updated by Nobuyoshi Nakada 12 months ago

  • Status changed from Assigned to Closed

Applied in changeset r52896.


hash.c: env encoding fallback on Windows

  • hash.c (env_str_new, env_path_str_new): make default string UTF-8 for the case conversion is not possible. [Bug #8822]
  • hash.c (get_env_cstr): convert non-ASCII string to UTF-8 string.
  • hash.c (ruby_setenv): use wide char version to put environment variable to deal with non-ASCII value.

#6 [ruby-core:76105] Updated by Ethan Brown 5 months ago

  • Backport deleted (1.9.3: UNKNOWN, 2.0.0: UNKNOWN)

I don't believe this is properly fixed.

I just left a comment at https://bugs.ruby-lang.org/issues/9715#note-5, and will leave the same comment here:

The expectation is that regardless of current locale / codepage, I should get UTF-8 strings when using ENV on Windows. Here is a simple reproduction of the failure on 2.3.0:

C:\Users\Administrator> $env:unicode = 'taskᚠᛇᚻ'
C:\Users\Administrator> dir Env:\unicode

Name                           Value
----                           -----
unicode                        taskᚠᛇᚻ


C:\Users\Administrator> ruby --version
ruby 2.3.0p0 (2015-12-25 revision 53290) [x64-mingw32]
C:\Users\Administrator> chcp
Active code page: 437

C:\Users\Administrator> irb
irb(main):001:0> RUBY_VERSION
=> "2.3.0"
irb(main):002:0> Encoding.default_internal
=> nil
irb(main):003:0> Encoding.default_external
=> #<Encoding:IBM437>
irb(main):004:0> str = ENV['unicode']
=> "task???"
irb(main):005:0> str.encoding
=> #<Encoding:IBM437>

Again, when I access ENV on Windows, I should receive a UTF-8 string with the correct data, not a IBM437 string. The expected string in this case is:

irb(main):036:0> str2 = "task\u16A0\u16C7\u16BB"
=> "task\u16A0\u16C7\u16BB"
irb(main):037:0> str2.encoding
=> #<Encoding:UTF-8>

Note that some browsers, like Chrome on OSX, may fail to render the Rune characters correctly, but if you copy into a proper editor or use another browser you should see the characters fine.

Also available in: Atom PDF