Bug #5297
Either File.expand_path or File.join is corrupting string encoding
| Status: | Closed | Start date: | 09/08/2011 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | core | |||
| Target version: | 2.0.0 | |||
| ruby -v: | ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32] |
Description
Hello,
While working on some API improvements for Windows, found the following issue:
https://gist.github.com/1202366
<pre>
V:\fóñè>ruby -v
ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32]
V:\fóñè>chcp 1252
Active code page: 1252
V:\fóñè>ruby -e "puts Encoding.default_external"
Windows-1252
V:\fóñè>irb
irb(main):001:0> a = File.expand_path "."
=> "V:/fóñè"
irb(main):002:0> a.encoding
=> #<Encoding:Windows-1252>
irb(main):003:0> b = Dir.glob("../*").first
=> "../fóñè"
irb(main):004:0> b.encoding
=> #<Encoding:Windows-1252>
irb(main):005:0> File.expand_path b
=> "V:/fóñè"
irb(main):006:0> c = File.expand_path b
=> "V:/fóñè"
irb(main):007:0> c.encoding
=> #<Encoding:Windows-1252>
irb(main):008:0> d = File.join(a, "foo")
=> "V:/f\xF3\xF1\xE8/foo"
irb(main):009:0> d.encoding
=> #<Encoding:ASCII-8BIT> # <= FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
irb(main):010:0> e = "#{a}/foo"
=> "V:/fóñè/foo"
irb(main):011:0> e.encoding
=> #<Encoding:Windows-1252>
irb(main):012:0> File.open(d, "w+") { |f| f.puts "hi" }
Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F.????
from (irb):12:in `initialize'
from (irb):12:in `open'
from (irb):12
from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `<main>'
irb(main):013:0> File.open(e, "w+") { |f| f.puts "hi" }
Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F. * 20!
from (irb):13:in `initialize'
from (irb):13:in `open'
from (irb):13
from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `<main>'
irb(main):014:0>
</pre>
It is not clear why while File.expand_path worked, File.join broke but string interpolation didn't.
Even worse is that File.open failed.
I'm working on a replacement function for expand_path that rely on MultiByteToWideChar + GetFullPathNameW + WideCharToMultiByte and then uses rb_filesystem_str_new_cstr to return the string.
The funny fact is that replacement work properly:
<pre>
C:\Users\Luis\Projects\oss\me\fenix>ripl -Ilib
>> require "fenix"
=> true
>> Dir.chdir "V:"
=> 0
>> Dir.pwd
=> "V:/fóñè"
>> c = Fenix::File.expand_path "."
=> "V:/fóñè"
>> c.encoding
=> #<Encoding:Windows-1252>
>> File.join(c, "foo").encoding
=> #<Encoding:Windows-1252>
>> d = "#{c}/foo"
=> "V:/fóñè/foo"
>> d.encoding
=> #<Encoding:Windows-1252>
>> File.open(d, "w") { |f| f.puts "hi" }
=> nil
</pre>
History
Updated by luislavena (Luis Lavena) 7 months ago
- Status changed from Open to Closed
This has been solved already associated to another bug report.
Updated by patrickb (Patrick Bennett) 3 months ago
Which other issue is this associated with?
Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125
Updated by luislavena (Luis Lavena) 3 months ago
Patrick Bennett wrote:
> Which other issue is this associated with?
> Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125
Sorry, but with released patchlevel 125 I can no longer reproduce this:
<pre>
V:\fóñè>ruby -v
ruby 1.9.3p125 (2012-02-16) [i386-mingw32]
V:\fóñè>date /T
29/02/2012
V:\fóñè>time /T
02:46 p.m.
V:\fóñè>chcp
Active code page: 1252
V:\fóñè>ruby -e "puts Encoding.default_external"
Windows-1252
V:\fóñè>irb
irb(main):001:0> a = File.expand_path "."
=> "V:/fóñè"
irb(main):002:0> a.encoding
=> #<Encoding:Windows-1252>
irb(main):003:0> b = Dir.glob("../*")[1]
=> "../fóñè"
irb(main):004:0> b.encoding
=> #<Encoding:Windows-1252>
irb(main):005:0> c = File.expand_path b
=> "V:/fóñè"
irb(main):006:0> c.encoding
=> #<Encoding:Windows-1252>
irb(main):007:0> d = File.join(a, "foo")
=> "V:/fóñè/foo"
irb(main):008:0> d.encoding
=> #<Encoding:Windows-1252>
irb(main):009:0> e = "#{a}/foo"
=> "V:/fóñè/foo"
irb(main):010:0> e.encoding
=> #<Encoding:Windows-1252>
irb(main):011:0> File.open(d, "w+") { |f| f.puts "hi" }
=> nil
irb(main):012:0> File.open(e, "w+") { |f| f.puts "hi" }
=> nil
irb(main):013:0> exit
</pre>
Updated by patrickb (Patrick Bennett) 3 months ago
With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.
So, using your irb example up through the File.join
irb(main):001:0> a = File.expand_path "."
=> "d:/test-streams"
irb(main):002:0> a.encoding
=> #<Encoding:Windows-1252>
irb(main):003:0> b = Dir.glob("../*")[1]
=> "../2dot4DSTree.reg"
irb(main):004:0> b.encoding
=> #<Encoding:IBM437>
irb(main):005:0> c = File.expand_path b
=> "d:/2dot4DSTree.reg"
irb(main):006:0> c.encoding
=> #<Encoding:Windows-1252>
irb(main):007:0> d = File.join(a, "foo")
=> "d:/test-streams/foo"
irb(main):008:0> d.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):009:0> File.join('foo','bar').encoding
=> #<Encoding:ASCII-8BIT>
The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.
Updated by luislavena (Luis Lavena) 3 months ago
Patrick Bennett wrote:
> With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.
>
> => #<Encoding:ASCII-8BIT>
>
> The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.
The problem is your system encoding.
For some reason from IBM437 to Windows-1252 on Dir.glob is not working.
Please open a separate issue.
The issue described here is about File.join messing with encoding and causing File.open to fail.