Feature #12650
closedUse UTF-8 encoding for ENV on Windows
Description
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.
I've attached a patch which implements this and fixes bug #9715
Files
Updated by usa (Usaku NAKAMURA) over 8 years ago
We don't want to break compatibility.
Wait Ruby3.
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
- Tracker changed from Bug to Feature
Updated by spatulasnout (B Kelly) over 8 years ago
Hi,
Usaku NAKAMURA wrote:
We don't want to break compatibility.
Wait Ruby3.
We always invoke ruby with -EUTF-8:UTF-8 .
Would make sense to enable this patch in ruby 2.x in such situations
where UTF-8 behavior has been requested explicitly?
Updated by naruse (Yui NARUSE) over 8 years ago
- Related to Bug #9715: ENV data yield ASCII-8BIT encoded strings under Windows with unicode username added
Updated by Iristyle (Ethan Brown) about 8 years ago
If you could rethink the plan to wait until Ruby 3, that would be great.
I would expect Ruby to normalize on UTF-8 strings everywhere internally, and only convert to local codepage on the boundary (such as writing to console, file, etc).
We are tracking a number of issues in Puppet that we believe are caused by the current behavior:
Updated by thomthom (Thomas Thomassen) almost 8 years ago
B Kelly wrote:
Hi,
Usaku NAKAMURA wrote:
We don't want to break compatibility.
Wait Ruby3.We always invoke ruby with -EUTF-8:UTF-8 .
Would make sense to enable this patch in ruby 2.x in such situations
where UTF-8 behavior has been requested explicitly?
I would like to second this request. We are also troubled by the encoding issues under Windows. Not sure when Ruby 3 is planned to be released, but we would prefer for a more immediate solution.
Updated by shyouhei (Shyouhei Urabe) almost 8 years ago
We looked at this issue in today's developer meeting.
First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters. Tell me if it's wrong. Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem. The problem is to read from it.
Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8. A tragedy is Windows does have chcp 65001, which is not practically used anywhere. So windows users are left in their code pages.
I understand you want to use UTF_8. In order to do so, changing default encoding is not practically possible now because of backwards compatibility. I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing. Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?
Updated by thomthom (Thomas Thomassen) almost 8 years ago
I would be ok with it not being default, as long as it can be configured for the whole interpreter and not some magic comment that would have to be in each source file.
In our particular scenario we are embedding Ruby into our application and we would like to configure the Ruby interpreter to use this "UTF-8 mode".
People that are writing Ruby extensions for our application already have to use hacks such as force_encoding to correct this - and it's a constant source of bugs and problems. If we could force ENV strings to be UTF-8 by default for the embedded environment we provide that be a great relief for us.
shyouhei (Shyouhei Urabe) wrote:
We looked at this issue in today's developer meeting.
First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters. Tell me if it's wrong. Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem. The problem is to read from it.
Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8. A tragedy is Windows does have chcp 65001, wich is not practically used anywhere. So windows users are left in their code pages.
I understand you want to use UTF_8. In order to do so, changing default encoding is not practically possible now because of backwards compatibility. I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing. Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?
Updated by naruse (Yui NARUSE) about 5 years ago
- Assignee set to windows
- Target version set to 3.0
Updated by larskanis (Lars Kanis) about 4 years ago
A patch for ruby-3.0 is here: https://github.com/ruby/ruby/pull/3818
Updated by naruse (Yui NARUSE) about 4 years ago
- Related to Feature #16604: Set default for Encoding.default_external to UTF-8 on Windows added
Updated by larskanis (Lars Kanis) about 4 years ago
This issue can be closed. It's merged in ca76337a00244635faa331afd04f4b75161ce6fb
Updated by duerst (Martin Dürst) about 4 years ago
- Status changed from Open to Closed