Project

General

Profile

Bug #877

[win32] Ruby Standard Library (maybe smth else): Wrong Encoding in Files, Directories and Environment Variables

Added by eveel (Dmitry A. Ustalov) almost 11 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Target version:
ruby -v:
-
[ruby-core:20557]

Description

=begin
I am from Russia, and my system language is set to Russian.

When I tried to create a directory via Dir.mkdir method:

irb(main):002:0> Dir.mkdir "c:/ruby/проверка"
=> 0

Word "проверка" means "test" in Russian.

Directory name appears in wrong charset (details at the
screenshot).

irb(main):003:0> File.exists? "c:/ruby/проверка"
=> true

This is a root of many problems, for example, when program
tries to create a directory in %USERPROFILE%/Application Data, see:

Microsoft Windows XP Версия 5.1.2600 Корпорация Майкрософт, 1985-2001.

C:\Documents and Settings\Администратор>irb
irb(main):001:0> $KCODE = 'utf8'
=> "utf8"
irb(main):002:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250стр\240тор"
irb(main):003:0> $KCODE = ''
=> ""
irb(main):004:0> ENV['userprofile']
=> "C:\Documents and Settings\\200\244\254\250\255\250\341\342\340\240\342\256
\340"
irb(main):005:0> File.exists? ENV['userprofile']
=> false

Word "Администратор" means "Administrator" in Russian.

Microsoft Windows XP [Version 5.1.2600].

C:>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

Ruby is installed from http://rubyinstaller.rubyforge.org/ .
=end


Files

ruby.png (103 KB) ruby.png screenshot eveel (Dmitry A. Ustalov), 12/15/2008 07:10 AM
multibyte-cant-help.png (42.5 KB) multibyte-cant-help.png another screenshot eveel (Dmitry A. Ustalov), 12/18/2008 10:18 PM
cucumber-method-is-also-wrong.png (5.28 KB) cucumber-method-is-also-wrong.png another wrong method eveel (Dmitry A. Ustalov), 12/20/2008 11:46 PM

History

#1

Updated by luislavena (Luis Lavena) almost 11 years ago

=begin
I noticed issues with other things, like puts, print and such.

Most of the File and IO functions for Windows are ANSI, not Wide, which limits the options to process properly paths, filenames and even output of strings using UTF/Unicode characters.

Also, the console page affects ruby. By default is 437, but 1252 is needed to get accented strings to work.

Further review of the used Windows API is needed to find these issues.

=end

#2

Updated by eveel (Dmitry A. Ustalov) almost 11 years ago

=begin
This is bug or feature? :)

I hope that this behavior in Windows would be corrected in the
new versions of Ruby.

Is there a workaround for this bug?

Also, there are cp1251 for Russian, not cp1252.
=end

#3

Updated by antares (Michael Klishin) almost 11 years ago

=begin
Both cp1251 and cp1252 are ASCII extensions from Microsoft, and Ruby 1.8 assumes strings are all ASCII unless you use multibyte gem or activesupport. So try that and if you can get it working in console (windows console, not irb) with mkdir, you can try using Kernel#system.
=end

#4

Updated by eveel (Dmitry A. Ustalov) over 10 years ago

=begin
I've tried to use this workaround, but probably he doesn't work.
=end

#5

Updated by antares (Michael Klishin) over 10 years ago

=begin
Well, since for Ruby cyrillic characters are integers (just like any others), it uses integer values and Windows does not normalize them (OS X does, for instance). I see no way to fix this in 1.8.x branch and in 1.9 you already have encoding-aware strings, IO objects and so forth. But I am by no means M17N expert and may be wrong.
=end

#6

Updated by eveel (Dmitry A. Ustalov) over 10 years ago

=begin
Okay, thanks.

Perhaps I should take into view this behaviour in the Windows until
Ruby 1.9 (or 2.0?) becomes stable.
=end

#7

Updated by antares (Michael Klishin) over 10 years ago

=begin
1.9.1 branch is stable for day-to-day use, I do not know about any available builds for Windows though, and some libraries you may want to use still need to catch up. v1_8_0 tag of Ruby is from 2003 and Windows XP is from 2001 or so. At some point in time, people should consider moving on or just accept what is missing in older versions. Ruby is not unique in this regard.
=end

#8

Updated by eveel (Dmitry A. Ustalov) over 10 years ago

=begin
It is impossible to jump into 1.9.x, because I use Shoes,
which is compiled by _why with Ruby 1.8.x.

So I can not build Shoes with a new Ruby for every operating
system that supported by Shoes.
=end

#9

Updated by antares (Michael Klishin) over 10 years ago

=begin
well, maybe others can help, why don't you ask at shoes mailing list?
=end

#10

Updated by eveel (Dmitry A. Ustalov) over 10 years ago

=begin
Because the bug, described here, doesn't applies especially to Shoes:
many other specific applications (that works in Win32 and operating
with environment variables and with file system entirely) has a
encoding-misunderstanding problem.

I found a dirty workaround: application should place its own folder
into %CommonProgramFiles%/AppName, instead of
%USERPROFILE%/Application Data/AppName.

This method has one disadvantage: data, which stored by application,
is available to everybody. I'm sorry for offtopic.

Issue should be closed, thanks for your time, comments and tips.
=end

#11

Updated by luislavena (Luis Lavena) over 10 years ago

=begin
Similar situation with print of encoded characters happened to Cucumber developers:

http://rspec.lighthouseapp.com/projects/16211-cucumber/tickets/81

They ended using chcp and Iconv to do the character conversion back and forth.

=end

#12

Updated by eveel (Dmitry A. Ustalov) over 10 years ago

=begin
I'm sorry for long answer delay.

Their solution is described at http://codesnippets.joyent.com/posts/show/414,
and implemented in
http://github.com/aslakhellesoy/cucumber/tree/master/lib/cucumber/formatters/unicode.rb

This method applies to output routines only, and it is useless here.
Actually problem is in the Windows-specific implementation of some Ruby
libraries. Ruby reads the environment variable in awful wrong encoding, and
works with file system objects in awful wrong encoding, too.

My attempts to iconv() ENV['userprofile'] to adequate charset
were unsuccessful (cp1251, cp1252, cp866).

Perhaps this bug is unresolvable (as Michael Klishin noted above), and
despite some limitations, I'll use my workaround before Ruby 1.8
will be replaced by 1.9.
=end

#13

Updated by shyouhei (Shyouhei Urabe) over 10 years ago

  • Assignee set to usa (Usaku NAKAMURA)

=begin

=end

#14

Updated by usa (Usaku NAKAMURA) over 10 years ago

  • Category set to core
  • Status changed from Open to Rejected
  • ruby -v set to -

=begin
There are no plan to resolve the original problem on 1.8.
You must pass the path with Win32 file API's encoding to ruby.

I know it's VERY inconvenient for users in Europe, but we cannot break compatibility of commandline/path handling in 1.8 branch.
=end

Also available in: Atom PDF