Bug #877

[win32] Ruby Standard Library (maybe smth else): Wrong Encoding in Files, Directories and Environment Variables

Added by Dmitry A. Ustalov about 3 years ago. Updated 10 months ago.

[ruby-core:20557]
Status:Rejected Start date:12/15/2008
Priority:High Due date:
Assignee:Usaku NAKAMURA % Done:

0%

Category:core
Target version:Ruby 1.8.6
ruby -v:-

Description

I am from Russia, and my system language is set to Russian.

When I tried to create a directory via Dir.mkdir method:

irb(main):002:0> Dir.mkdir "c:/ruby/проверка"
=> 0

Word "проверка" means "test" in Russian.

Directory name appears in wrong charset (details at the
screenshot).

irb(main):003:0> File.exists? "c:/ruby/проверка"
=> true

This is a root of many problems, for example, when program
tries to create a directory in %USERPROFILE%/Application Data, see:

Microsoft Windows XP [Версия 5.1.2600]
(С) Корпорация Майкрософт, 1985-2001.

C:\Documents and Settings\Администратор>irb
irb(main):001:0> $KCODE = 'utf8'
=> "utf8"
irb(main):002:0> ENV['userprofile']
=> "C:\\Documents and Settings\\\200\244\254\250\255\250стр\240тор"
irb(main):003:0> $KCODE = ''
=> ""
irb(main):004:0> ENV['userprofile']
=> "C:\\Documents and Settings\\\200\244\254\250\255\250\341\342\340\240\342\256
\340"
irb(main):005:0> File.exists? ENV['userprofile']
=> false

Word "Администратор" means "Administrator" in Russian.

Microsoft Windows XP [Version 5.1.2600].

C:\>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

Ruby is installed from http://rubyinstaller.rubyforge.org/ .

ruby.png - screenshot (103.4 kB) Dmitry A. Ustalov, 12/15/2008 07:10 am

multibyte-cant-help.png - another screenshot (42.5 kB) Dmitry A. Ustalov, 12/18/2008 10:18 pm

cucumber-method-is-also-wrong.png - another wrong method (5.3 kB) Dmitry A. Ustalov, 12/20/2008 11:46 pm

History

Updated by Luis Lavena about 3 years ago

I noticed issues with other things, like puts, print and such.

Most of the File and IO functions for Windows are ANSI, not Wide, which limits the options to process properly paths, filenames and even output of strings using UTF/Unicode characters.

Also, the console page affects ruby. By default is 437, but 1252 is needed to get accented strings to work.

Further review of the used Windows API is needed to find these issues.

Updated by Dmitry A. Ustalov about 3 years ago

This is bug or feature? :)

I hope that this behavior in Windows would be corrected in the
new versions of Ruby.

Is there a workaround for this bug?

Also, there are cp1251 for Russian, not cp1252.

Updated by Michael Klishin about 3 years ago

Both cp1251 and cp1252 are ASCII extensions from Microsoft, and Ruby 1.8 assumes strings are all ASCII unless you use multibyte gem or activesupport. So try that and if you can get it working in console (windows console, not irb) with mkdir, you can try using Kernel#system.

Updated by Dmitry A. Ustalov about 3 years ago

I've tried to use this workaround, but probably he doesn't work.

Updated by Michael Klishin about 3 years ago

Well, since for Ruby cyrillic characters are integers (just like any others), it uses integer values and Windows does not normalize them (OS X does, for instance). I see no way to fix this in 1.8.x branch and in 1.9 you already have encoding-aware strings, IO objects and so forth. But I am by no means M17N expert and may be wrong.

Updated by Dmitry A. Ustalov about 3 years ago

Okay, thanks.

Perhaps I should take into view this behaviour in the Windows until
Ruby 1.9 (or 2.0?) becomes stable.

Updated by Michael Klishin about 3 years ago

1.9.1 branch is stable for day-to-day use, I do not know about any available builds for Windows though, and some libraries you may want to use still need to catch up. v1_8_0 tag of Ruby is from 2003 and Windows XP is from 2001 or so. At some point in time, people should consider moving on or just accept what is missing in older versions. Ruby is not unique in this regard.

Updated by Dmitry A. Ustalov about 3 years ago

It is impossible to jump into 1.9.x, because I use Shoes,
which is compiled by _why with Ruby 1.8.x.

So I can not build Shoes with a new Ruby for every operating
system that supported by Shoes.

Updated by Michael Klishin about 3 years ago

well, maybe others can help, why don't you ask at shoes mailing list?

Updated by Dmitry A. Ustalov about 3 years ago

Because the bug, described here, doesn't applies especially to Shoes:
many other specific applications (that works in Win32 and operating
with environment variables and with file system entirely) has a
encoding-misunderstanding problem.

I found a dirty workaround: application should place its own folder
into %CommonProgramFiles%/AppName, instead of
%USERPROFILE%/Application Data/AppName.

This method has one disadvantage: data, which stored by application,
is available to everybody. I'm sorry for offtopic.

Issue should be closed, thanks for your time, comments and tips.

Updated by Luis Lavena about 3 years ago

Similar situation with print of encoded characters happened to Cucumber developers:

http://rspec.lighthouseapp.com/projects/16211-cucumber/tickets/81

They ended using chcp and Iconv to do the character conversion back and forth.

Updated by Dmitry A. Ustalov about 3 years ago

I'm sorry for long answer delay.

Their solution is described at http://codesnippets.joyent.com/posts/show/414,
and implemented in
http://github.com/aslakhellesoy/cucumber/tree/master/lib/cucumber/formatters/unicode.rb

This method applies to output routines only, and it is useless here.
Actually problem is in the Windows-specific implementation of some Ruby
libraries. Ruby reads the environment variable in awful wrong encoding, and
works with file system objects in awful wrong encoding, too.

My attempts to iconv() ENV['userprofile'] to adequate charset
were unsuccessful (cp1251, cp1252, cp866).

Perhaps this bug is unresolvable (as Michael Klishin noted above), and
despite some limitations, I'll use my workaround before Ruby 1.8
will be replaced by 1.9.

Updated by Shyouhei Urabe about 3 years ago

  • Assignee set to Usaku NAKAMURA

Updated by Usaku NAKAMURA almost 3 years ago

  • Category set to core
  • Status changed from Open to Rejected
  • ruby -v set to -
There are no plan to resolve the original problem on 1.8.
You must pass the path with Win32 file API's encoding to ruby.

I know it's VERY inconvenient for users in Europe, but we cannot break compatibility of commandline/path handling in 1.8 branch.

Also available in: Atom PDF