Bug #5297
closedEither File.expand_path or File.join is corrupting string encoding
Description
Hello,
While working on some API improvements for Windows, found the following issue:
https://gist.github.com/1202366
V:\fóñè>ruby -v ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32] V:\fóñè>chcp 1252 Active code page: 1252 V:\fóñè>ruby -e "puts Encoding.default_external" Windows-1252 V:\fóñè>irb irb(main):001:0> a = File.expand_path "." => "V:/fóñè" irb(main):002:0> a.encoding => # irb(main):003:0> b = Dir.glob("../*").first => "../fóñè" irb(main):004:0> b.encoding => # irb(main):005:0> File.expand_path b => "V:/fóñè" irb(main):006:0> c = File.expand_path b => "V:/fóñè" irb(main):007:0> c.encoding => # irb(main):008:0> d = File.join(a, "foo") => "V:/f\xF3\xF1\xE8/foo" irb(main):009:0> d.encoding => # # <= FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU irb(main):010:0> e = "#{a}/foo" => "V:/fóñè/foo" irb(main):011:0> e.encoding => # irb(main):012:0> File.open(d, "w+") { |f| f.puts "hi" } Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F.???? from (irb):12:in `initialize' from (irb):12:in `open' from (irb):12 from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `' irb(main):013:0> File.open(e, "w+") { |f| f.puts "hi" } Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F. * 20! from (irb):13:in `initialize' from (irb):13:in `open' from (irb):13 from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `' irb(main):014:0>
It is not clear why while File.expand_path worked, File.join broke but string interpolation didn't.
Even worse is that File.open failed.
I'm working on a replacement function for expand_path that rely on MultiByteToWideChar + GetFullPathNameW + WideCharToMultiByte and then uses rb_filesystem_str_new_cstr to return the string.
The funny fact is that replacement work properly:
C:\Users\Luis\Projects\oss\me\fenix>ripl -Ilib >> require "fenix" => true >> Dir.chdir "V:" => 0 >> Dir.pwd => "V:/fóñè" >> c = Fenix::File.expand_path "." => "V:/fóñè" >> c.encoding => # >> File.join(c, "foo").encoding => # >> d = "#{c}/foo" => "V:/fóñè/foo" >> d.encoding => # >> File.open(d, "w") { |f| f.puts "hi" } => nil
Updated by luislavena (Luis Lavena) about 13 years ago
- Status changed from Open to Closed
This has been solved already associated to another bug report.
Updated by patrickb (Patrick Bennett) over 12 years ago
Which other issue is this associated with?
Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125
Updated by luislavena (Luis Lavena) over 12 years ago
Patrick Bennett wrote:
Which other issue is this associated with?
Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125
Sorry, but with released patchlevel 125 I can no longer reproduce this:
V:\fóñè>ruby -v ruby 1.9.3p125 (2012-02-16) [i386-mingw32] V:\fóñè>date /T 29/02/2012 V:\fóñè>time /T 02:46 p.m. V:\fóñè>chcp Active code page: 1252 V:\fóñè>ruby -e "puts Encoding.default_external" Windows-1252 V:\fóñè>irb irb(main):001:0> a = File.expand_path "." => "V:/fóñè" irb(main):002:0> a.encoding => # irb(main):003:0> b = Dir.glob("../*")[1] => "../fóñè" irb(main):004:0> b.encoding => # irb(main):005:0> c = File.expand_path b => "V:/fóñè" irb(main):006:0> c.encoding => # irb(main):007:0> d = File.join(a, "foo") => "V:/fóñè/foo" irb(main):008:0> d.encoding => # irb(main):009:0> e = "#{a}/foo" => "V:/fóñè/foo" irb(main):010:0> e.encoding => # irb(main):011:0> File.open(d, "w+") { |f| f.puts "hi" } => nil irb(main):012:0> File.open(e, "w+") { |f| f.puts "hi" } => nil irb(main):013:0> exit
Updated by patrickb (Patrick Bennett) over 12 years ago
With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.
So, using your irb example up through the File.join
irb(main):001:0> a = File.expand_path "."
=> "d:/test-streams"
irb(main):002:0> a.encoding
=> #Encoding:Windows-1252
irb(main):003:0> b = Dir.glob("../*")[1]
=> "../2dot4DSTree.reg"
irb(main):004:0> b.encoding
=> #Encoding:IBM437
irb(main):005:0> c = File.expand_path b
=> "d:/2dot4DSTree.reg"
irb(main):006:0> c.encoding
=> #Encoding:Windows-1252
irb(main):007:0> d = File.join(a, "foo")
=> "d:/test-streams/foo"
irb(main):008:0> d.encoding
=> #Encoding:ASCII-8BIT
irb(main):009:0> File.join('foo','bar').encoding
=> #Encoding:ASCII-8BIT
The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.
Updated by luislavena (Luis Lavena) over 12 years ago
Patrick Bennett wrote:
With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it.
The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.
The problem is your system encoding.
For some reason from IBM437 to Windows-1252 on Dir.glob is not working.
Please open a separate issue.
The issue described here is about File.join messing with encoding and causing File.open to fail.