Bug #5297

Either File.expand_path or File.join is corrupting string encoding

Added by luislavena (Luis Lavena) 9 months ago. Updated 3 months ago.

[ruby-core:39355]
Status:Closed Start date:09/08/2011
Priority:Normal Due date:
Assignee:usa (Usaku NAKAMURA) % Done:

0%

Category:core
Target version:2.0.0
ruby -v:ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32]

Description

Hello, While working on some API improvements for Windows, found the following issue: https://gist.github.com/1202366 <pre> V:\fóñè>ruby -v ruby 1.9.4dev (2011-09-07 trunk 33212) [i386-mingw32] V:\fóñè>chcp 1252 Active code page: 1252 V:\fóñè>ruby -e "puts Encoding.default_external" Windows-1252 V:\fóñè>irb irb(main):001:0> a = File.expand_path "." => "V:/fóñè" irb(main):002:0> a.encoding => #<Encoding:Windows-1252> irb(main):003:0> b = Dir.glob("../*").first => "../fóñè" irb(main):004:0> b.encoding => #<Encoding:Windows-1252> irb(main):005:0> File.expand_path b => "V:/fóñè" irb(main):006:0> c = File.expand_path b => "V:/fóñè" irb(main):007:0> c.encoding => #<Encoding:Windows-1252> irb(main):008:0> d = File.join(a, "foo") => "V:/f\xF3\xF1\xE8/foo" irb(main):009:0> d.encoding => #<Encoding:ASCII-8BIT> # <= FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU irb(main):010:0> e = "#{a}/foo" => "V:/fóñè/foo" irb(main):011:0> e.encoding => #<Encoding:Windows-1252> irb(main):012:0> File.open(d, "w+") { |f| f.puts "hi" } Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F.???? from (irb):12:in `initialize' from (irb):12:in `open' from (irb):12 from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `<main>' irb(main):013:0> File.open(e, "w+") { |f| f.puts "hi" } Errno::ENOENT: No such file or directory - V:/fóñè/foo # <= W.T.F. * 20! from (irb):13:in `initialize' from (irb):13:in `open' from (irb):13 from C:/Users/Luis/Tools/Ruby/ruby-head-i386-mingw32/bin/irb:12:in `<main>' irb(main):014:0> </pre> It is not clear why while File.expand_path worked, File.join broke but string interpolation didn't. Even worse is that File.open failed. I'm working on a replacement function for expand_path that rely on MultiByteToWideChar + GetFullPathNameW + WideCharToMultiByte and then uses rb_filesystem_str_new_cstr to return the string. The funny fact is that replacement work properly: <pre> C:\Users\Luis\Projects\oss\me\fenix>ripl -Ilib >> require "fenix" => true >> Dir.chdir "V:" => 0 >> Dir.pwd => "V:/fóñè" >> c = Fenix::File.expand_path "." => "V:/fóñè" >> c.encoding => #<Encoding:Windows-1252> >> File.join(c, "foo").encoding => #<Encoding:Windows-1252> >> d = "#{c}/foo" => "V:/fóñè/foo" >> d.encoding => #<Encoding:Windows-1252> >> File.open(d, "w") { |f| f.puts "hi" } => nil </pre>

History

Updated by luislavena (Luis Lavena) 7 months ago

  • Status changed from Open to Closed
This has been solved already associated to another bug report.

Updated by patrickb (Patrick Bennett) 3 months ago

Which other issue is this associated with? Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125

Updated by luislavena (Luis Lavena) 3 months ago

Patrick Bennett wrote: > Which other issue is this associated with? > Is this going to be patched back to 1.9.3? It's still present in 1.9.3p125 Sorry, but with released patchlevel 125 I can no longer reproduce this: <pre> V:\fóñè>ruby -v ruby 1.9.3p125 (2012-02-16) [i386-mingw32] V:\fóñè>date /T 29/02/2012 V:\fóñè>time /T 02:46 p.m. V:\fóñè>chcp Active code page: 1252 V:\fóñè>ruby -e "puts Encoding.default_external" Windows-1252 V:\fóñè>irb irb(main):001:0> a = File.expand_path "." => "V:/fóñè" irb(main):002:0> a.encoding => #<Encoding:Windows-1252> irb(main):003:0> b = Dir.glob("../*")[1] => "../fóñè" irb(main):004:0> b.encoding => #<Encoding:Windows-1252> irb(main):005:0> c = File.expand_path b => "V:/fóñè" irb(main):006:0> c.encoding => #<Encoding:Windows-1252> irb(main):007:0> d = File.join(a, "foo") => "V:/fóñè/foo" irb(main):008:0> d.encoding => #<Encoding:Windows-1252> irb(main):009:0> e = "#{a}/foo" => "V:/fóñè/foo" irb(main):010:0> e.encoding => #<Encoding:Windows-1252> irb(main):011:0> File.open(d, "w+") { |f| f.puts "hi" } => nil irb(main):012:0> File.open(e, "w+") { |f| f.puts "hi" } => nil irb(main):013:0> exit </pre>

Updated by patrickb (Patrick Bennett) 3 months ago

With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it. So, using your irb example up through the File.join irb(main):001:0> a = File.expand_path "." => "d:/test-streams" irb(main):002:0> a.encoding => #<Encoding:Windows-1252> irb(main):003:0> b = Dir.glob("../*")[1] => "../2dot4DSTree.reg" irb(main):004:0> b.encoding => #<Encoding:IBM437> irb(main):005:0> c = File.expand_path b => "d:/2dot4DSTree.reg" irb(main):006:0> c.encoding => #<Encoding:Windows-1252> irb(main):007:0> d = File.join(a, "foo") => "d:/test-streams/foo" irb(main):008:0> d.encoding => #<Encoding:ASCII-8BIT> irb(main):009:0> File.join('foo','bar').encoding => #<Encoding:ASCII-8BIT> The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though.

Updated by luislavena (Luis Lavena) 3 months ago

Patrick Bennett wrote: > With ruby 1.9.3p125 (2012-02-16) [i386-mingw32] File.join always converts to ASCII-8BIT for me no matter the encoding passed to it. > > => #<Encoding:ASCII-8BIT> > > The result is the same regardless of my default external codepage - if I change it to 1252 as you have it then b's encoding returns as 1252 instead of 437 (my default) but File.join still returns as ascii-8bit. The fact that we;re apparently using the same ruby version is a little troubling though. The problem is your system encoding. For some reason from IBM437 to Windows-1252 on Dir.glob is not working. Please open a separate issue. The issue described here is about File.join messing with encoding and causing File.open to fail.

Also available in: Atom PDF