Bug #877
closed[win32] Ruby Standard Library (maybe smth else): Wrong Encoding in Files, Directories and Environment Variables
Description
=begin
I am from Russia, and my system language is set to Russian.
When I tried to create a directory via Dir.mkdir method:
irb(main):002:0> Dir.mkdir "c:/ruby/проверка"
=> 0
Word "проверка" means "test" in Russian.
Directory name appears in wrong charset (details at the
screenshot).
irb(main):003:0> File.exists? "c:/ruby/проверка"
=> true
This is a root of many problems, for example, when program
tries to create a directory in %USERPROFILE%/Application Data, see:
Microsoft Windows XP [Версия 5.1.2600]
(С) Корпорация Майкрософт, 1985-2001.
C:\Documents and Settings\Администратор>irb
irb(main):001:0> $KCODE = 'utf8'
=> "utf8"
irb(main):002:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250стр\240тор"
irb(main):003:0> $KCODE = ''
=> ""
irb(main):004:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250\341\342\340\240\342\256
\340"
irb(main):005:0> File.exists? ENV['userprofile']
=> false
Word "Администратор" means "Administrator" in Russian.
Microsoft Windows XP [Version 5.1.2600].
C:>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
Ruby is installed from http://rubyinstaller.rubyforge.org/ .
=end
Files
Updated by luislavena (Luis Lavena) almost 16 years ago
=begin
I noticed issues with other things, like puts, print and such.
Most of the File and IO functions for Windows are ANSI, not Wide, which limits the options to process properly paths, filenames and even output of strings using UTF/Unicode characters.
Also, the console page affects ruby. By default is 437, but 1252 is needed to get accented strings to work.
Further review of the used Windows API is needed to find these issues.
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
=begin
This is bug or feature? :)
I hope that this behavior in Windows would be corrected in the
new versions of Ruby.
Is there a workaround for this bug?
Also, there are cp1251 for Russian, not cp1252.
=end
Updated by antares (Michael Klishin) almost 16 years ago
=begin
Both cp1251 and cp1252 are ASCII extensions from Microsoft, and Ruby 1.8 assumes strings are all ASCII unless you use multibyte gem or activesupport. So try that and if you can get it working in console (windows console, not irb) with mkdir, you can try using Kernel#system.
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
- File multibyte-cant-help.png multibyte-cant-help.png added
=begin
I've tried to use this workaround, but probably he doesn't work.
=end
Updated by antares (Michael Klishin) almost 16 years ago
=begin
Well, since for Ruby cyrillic characters are integers (just like any others), it uses integer values and Windows does not normalize them (OS X does, for instance). I see no way to fix this in 1.8.x branch and in 1.9 you already have encoding-aware strings, IO objects and so forth. But I am by no means M17N expert and may be wrong.
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
=begin
Okay, thanks.
Perhaps I should take into view this behaviour in the Windows until
Ruby 1.9 (or 2.0?) becomes stable.
=end
Updated by antares (Michael Klishin) almost 16 years ago
=begin
1.9.1 branch is stable for day-to-day use, I do not know about any available builds for Windows though, and some libraries you may want to use still need to catch up. v1_8_0 tag of Ruby is from 2003 and Windows XP is from 2001 or so. At some point in time, people should consider moving on or just accept what is missing in older versions. Ruby is not unique in this regard.
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
=begin
It is impossible to jump into 1.9.x, because I use Shoes,
which is compiled by _why with Ruby 1.8.x.
So I can not build Shoes with a new Ruby for every operating
system that supported by Shoes.
=end
Updated by antares (Michael Klishin) almost 16 years ago
=begin
well, maybe others can help, why don't you ask at shoes mailing list?
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
=begin
Because the bug, described here, doesn't applies especially to Shoes:
many other specific applications (that works in Win32 and operating
with environment variables and with file system entirely) has a
encoding-misunderstanding problem.
I found a dirty workaround: application should place its own folder
into %CommonProgramFiles%/AppName, instead of
%USERPROFILE%/Application Data/AppName.
This method has one disadvantage: data, which stored by application,
is available to everybody. I'm sorry for offtopic.
Issue should be closed, thanks for your time, comments and tips.
=end
Updated by luislavena (Luis Lavena) almost 16 years ago
=begin
Similar situation with print of encoded characters happened to Cucumber developers:
http://rspec.lighthouseapp.com/projects/16211-cucumber/tickets/81
They ended using chcp and Iconv to do the character conversion back and forth.
=end
Updated by eveel (Dmitry A. Ustalov) almost 16 years ago
=begin
I'm sorry for long answer delay.
Their solution is described at http://codesnippets.joyent.com/posts/show/414,
and implemented in
http://github.com/aslakhellesoy/cucumber/tree/master/lib/cucumber/formatters/unicode.rb
This method applies to output routines only, and it is useless here.
Actually problem is in the Windows-specific implementation of some Ruby
libraries. Ruby reads the environment variable in awful wrong encoding, and
works with file system objects in awful wrong encoding, too.
My attempts to iconv() ENV['userprofile'] to adequate charset
were unsuccessful (cp1251, cp1252, cp866).
Perhaps this bug is unresolvable (as Michael Klishin noted above), and
despite some limitations, I'll use my workaround before Ruby 1.8
will be replaced by 1.9.
=end
Updated by shyouhei (Shyouhei Urabe) almost 16 years ago
- Assignee set to usa (Usaku NAKAMURA)
=begin
=end
Updated by usa (Usaku NAKAMURA) over 15 years ago
- Category set to core
- Status changed from Open to Rejected
- ruby -v set to -
=begin
There are no plan to resolve the original problem on 1.8.
You must pass the path with Win32 file API's encoding to ruby.
I know it's VERY inconvenient for users in Europe, but we cannot break compatibility of commandline/path handling in 1.8 branch.
=end