Bug #877
[win32] Ruby Standard Library (maybe smth else): Wrong Encoding in Files, Directories and Environment Variables
| Status: | Rejected | Start date: | 12/15/2008 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | core | |||
| Target version: | Ruby 1.8.6 | |||
| ruby -v: | - |
Description
I am from Russia, and my system language is set to Russian. When I tried to create a directory via Dir.mkdir method: irb(main):002:0> Dir.mkdir "c:/ruby/проверка" => 0 Word "проверка" means "test" in Russian. Directory name appears in wrong charset (details at the screenshot). irb(main):003:0> File.exists? "c:/ruby/проверка" => true This is a root of many problems, for example, when program tries to create a directory in %USERPROFILE%/Application Data, see: Microsoft Windows XP [Версия 5.1.2600] (С) Корпорация Майкрософт, 1985-2001. C:\Documents and Settings\Администратор>irb irb(main):001:0> $KCODE = 'utf8' => "utf8" irb(main):002:0> ENV['userprofile'] => "C:\\Documents and Settings\\\200\244\254\250\255\250стр\240тор" irb(main):003:0> $KCODE = '' => "" irb(main):004:0> ENV['userprofile'] => "C:\\Documents and Settings\\\200\244\254\250\255\250\341\342\340\240\342\256 \340" irb(main):005:0> File.exists? ENV['userprofile'] => false Word "Администратор" means "Administrator" in Russian. Microsoft Windows XP [Version 5.1.2600]. C:\>ruby -v ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32] Ruby is installed from http://rubyinstaller.rubyforge.org/ .
History
Updated by luislavena (Luis Lavena) over 3 years ago
I noticed issues with other things, like puts, print and such. Most of the File and IO functions for Windows are ANSI, not Wide, which limits the options to process properly paths, filenames and even output of strings using UTF/Unicode characters. Also, the console page affects ruby. By default is 437, but 1252 is needed to get accented strings to work. Further review of the used Windows API is needed to find these issues.
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
This is bug or feature? :) I hope that this behavior in Windows would be corrected in the new versions of Ruby. Is there a workaround for this bug? Also, there are cp1251 for Russian, not cp1252.
Updated by antares (Michael Klishin) over 3 years ago
Both cp1251 and cp1252 are ASCII extensions from Microsoft, and Ruby 1.8 assumes strings are all ASCII unless you use multibyte gem or activesupport. So try that and if you can get it working in console (windows console, not irb) with mkdir, you can try using Kernel#system.
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
- File multibyte-cant-help.png added
I've tried to use this workaround, but probably he doesn't work.
Updated by antares (Michael Klishin) over 3 years ago
Well, since for Ruby cyrillic characters are integers (just like any others), it uses integer values and Windows does not normalize them (OS X does, for instance). I see no way to fix this in 1.8.x branch and in 1.9 you already have encoding-aware strings, IO objects and so forth. But I am by no means M17N expert and may be wrong.
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
Okay, thanks. Perhaps I should take into view this behaviour in the Windows until Ruby 1.9 (or 2.0?) becomes stable.
Updated by antares (Michael Klishin) over 3 years ago
1.9.1 branch is stable for day-to-day use, I do not know about any available builds for Windows though, and some libraries you may want to use still need to catch up. v1_8_0 tag of Ruby is from 2003 and Windows XP is from 2001 or so. At some point in time, people should consider moving on or just accept what is missing in older versions. Ruby is not unique in this regard.
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
It is impossible to jump into 1.9.x, because I use Shoes, which is compiled by _why with Ruby 1.8.x. So I can not build Shoes with a new Ruby for every operating system that supported by Shoes.
Updated by antares (Michael Klishin) over 3 years ago
well, maybe others can help, why don't you ask at shoes mailing list?
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
Because the bug, described here, doesn't applies especially to Shoes: many other specific applications (that works in Win32 and operating with environment variables and with file system entirely) has a encoding-misunderstanding problem. I found a dirty workaround: application should place its own folder into %CommonProgramFiles%/AppName, instead of %USERPROFILE%/Application Data/AppName. This method has one disadvantage: data, which stored by application, is available to everybody. I'm sorry for offtopic. Issue should be closed, thanks for your time, comments and tips.
Updated by luislavena (Luis Lavena) over 3 years ago
Similar situation with print of encoded characters happened to Cucumber developers: http://rspec.lighthouseapp.com/projects/16211-cucumber/tickets/81 They ended using chcp and Iconv to do the character conversion back and forth.
Updated by eveel (Dmitry A. Ustalov) over 3 years ago
- File cucumber-method-is-also-wrong.png added
I'm sorry for long answer delay. Their solution is described at http://codesnippets.joyent.com/posts/show/414, and implemented in http://github.com/aslakhellesoy/cucumber/tree/master/lib/cucumber/formatters/unicode.rb This method applies to output routines only, and it is useless here. Actually problem is in the Windows-specific implementation of some Ruby libraries. Ruby reads the environment variable in awful wrong encoding, and works with file system objects in awful wrong encoding, too. My attempts to iconv() ENV['userprofile'] to adequate charset were unsuccessful (cp1251, cp1252, cp866). Perhaps this bug is unresolvable (as Michael Klishin noted above), and despite some limitations, I'll use my workaround before Ruby 1.8 will be replaced by 1.9.
Updated by shyouhei (Shyouhei Urabe) over 3 years ago
- Assignee set to usa (Usaku NAKAMURA)
Updated by usa (Usaku NAKAMURA) about 3 years ago
- Category set to core
- Status changed from Open to Rejected
- ruby -v set to -
There are no plan to resolve the original problem on 1.8. You must pass the path with Win32 file API's encoding to ruby. I know it's VERY inconvenient for users in Europe, but we cannot break compatibility of commandline/path handling in 1.8 branch.