Bug #5486

rb_stat() doesn’t respect input encoding

Added by Nikolai Weibull over 2 years ago. Updated about 2 years ago.

[ruby-core:40412]
Status:Closed
Priority:Low
Assignee:Nobuyoshi Nakada
Category:M17N
Target version:-
ruby -v:- Backport:

Description

rbstat() overrides the input strings encoding and applies one of various encodings through rbstrencodeospath(). This may be convenient for certain kinds of user input or input from a source file in a different encoding, but it isn’t good for other kinds of user input or input from other functions, such as Dir.entries.

If Ruby wants us to be explicit about encodings, then Ruby shouldn’t change it behind our backs.

I suspect that this is an issue that may appear in various other functions as well.

History

#1 Updated by Usaku NAKAMURA over 2 years ago

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

#2 Updated by Nikolai Weibull over 2 years ago

On Fri, Oct 28, 2011 at 07:28, Usaku NAKAMURA redmine@ruby-lang.org wrote:

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem. What I did was accidentally
create a file whose name was encoded in UTF-16. Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file. e.file? will
return false for this file, even though it’s a file. The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.

#3 Updated by Nikolai Weibull over 2 years ago

On Fri, Oct 28, 2011 at 08:14, Nikolai Weibull now@bitwi.se wrote:

On Fri, Oct 28, 2011 at 07:28, Usaku NAKAMURA redmine@ruby-lang.org wrote:

Sorry, I can't understand your point.
If you think there is a bug, would you show us the bug by code?

That’s hard to do, but name a file in an encoding other than
'filesystem' on an NTFS filesystem.  What I did was accidentally
create a file whose name was encoded in UTF-16.  Then, do
Dir['dir'].entries.each{ |e| printf "%p: %s\n", e, File.file? e },
where 'dir' is the directory containing this file.  e.file? will
return false for this file, even though it’s a file.  The problem is,
as explained, in rb_stat(), as it re-encodes its argument in the
'filesystem' encoding.

Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

% echo $LCCTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb
# -- coding: utf-8 --
Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
D
% ruby --version
ruby 2.0.0dev (2011-10-26 trunk 33526) [x86
64-darwin10.8.0]
% ruby a.rb
".", #Encoding:UTF-8, false
"..", #Encoding:UTF-8, false
"å", #Encoding:UTF-8, false

I guess the problem is that Ruby assumes that it can apply an encoding
to something that it gets from the filesystem when it would probably
be better to not do so. It should probably be BINARY or ASCII-8BIT
instead of UTF-8.

(It turns out that this example gave the same results in 1.8.7 (minus
the e.encoding), so perhaps I’m doing something else wrong.)

Trying to do

p File.file?('t/å'.encode('UTF-16LE'))

results in

in `file?': path name must be ASCII-compatible (UTF-16LE): "t/\u00E5"
(Encoding::CompatibilityError)

I give up.

#4 Updated by Nobuyoshi Nakada over 2 years ago

  • ruby -v changed from ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32] to -

Hi,

(11/10/28 15:35), Nikolai Weibull wrote:

Actually, it’s probably easier than that. It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

It's not true.

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb

-- coding: utf-8 --

Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
D

`e' doesn't have directory prefix, "t/". It can't stat.

$ ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e, e.encoding, File.file?(e)}'
ruby 2.0.0dev (2011-10-25 trunk 33523) [universal.x86_64-darwin11.2.0]
".", #Encoding:UTF-8, false
"..", #Encoding:UTF-8, false
"å", #Encoding:UTF-8, true

--
Nobu Nakada

#5 Updated by Nikolai Weibull over 2 years ago

On Fri, Oct 28, 2011 at 09:20, Nobuyoshi Nakada nobu@ruby-lang.org wrote:

(11/10/28 15:35), Nikolai Weibull wrote:

Actually, it’s probably easier than that.  It can be done on a HFS+
filesystem (and probably any other, as well) just as easily

It's not true.

% echo $LC_CTYPE
UTF-8
% mkdir t
% touch t/å
% cat > a.rb

-- coding: utf-8 --

Dir.new('t').entries.each{ |e| printf "%p, %p, %s\n", e, e.encoding,
File.file?(e) }
D

`e' doesn't have directory prefix, "t/".  It can't stat.

Ouch, of course. How stupid of me. That explains why it didn’t work
under 1.8.7 either.

The point still remains valid on Windows, however:

% mkdir t
% touch t/→
% ruby -v -C t -e 'Dir.foreach(".") {|e| printf "%p, %p, %p\n", e,
e.encoding, File.file?(e)}'
ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
".", #Encoding:Windows-1252, false
"..", #Encoding:Windows-1252, false
"?", #Encoding:Windows-1252, false

Hm, I guess here the result of Dir.foreach is broken.

Here’s another case:

% ruby -v -rfind -e 'Find.find("t").each{ |e| printf "%p, %s, %p,
%p\n", e, e.dump, e.encoding, File.file?(e)}'
"t", "t", #Encoding:UTF-8, false
"t/?", "t/?", #Encoding:ASCII-8BIT, false

Equally broken, I guess.

#6 Updated by Koichi Sasada about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Nobuyoshi Nakada

#7 Updated by Nobuyoshi Nakada about 2 years ago

  • Category changed from core to M17N
  • Status changed from Assigned to Feedback
  • Priority changed from High to Low

Does this issue still occur?

#8 Updated by Nikolai Weibull about 2 years ago

On Sun, Mar 11, 2012 at 22:41, Nobuyoshi Nakada nobu@ruby-lang.org wrote:

Issue #5486 has been updated by Nobuyoshi Nakada.

Category changed from core to M17N
Status changed from Assigned to Feedback
Priority changed from High to Low

Does this issue still occur?

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

#9 Updated by Nikolai Weibull about 2 years ago

2012/3/15 U.Nakamura usa@garbagecollect.jp:

Hello,

In message " Re: [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
   on Mar.13,2012 18:03:04, now@bitwi.se wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

It's not trunk...
It seems too old.

How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.

#10 Updated by Anonymous about 2 years ago

On 3/14/12 11:24 PM, Nikolai Weibull wrote:

2012/3/15 U.Nakamurausa@garbagecollect.jp:

Hello,

In message " Re: [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
on Mar.13,2012 18:03:04,now@bitwi.se wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]
It's not trunk...
It seems too old.
How can you say that? I just tested it and got the same results. I
showed you my version string above, that’s trunk.

hi Nikolai,

try compiling an updated version of trunk from this repository:
svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

your version indicates it's from last year. here's a version string from
a recent compilation on my system:
% ./ruby --version
ruby 1.9.3p163 (2012-03-14 revision 35012) [x86_64-darwin11.3.0]

does that help at all?

#11 Updated by Yui NARUSE about 2 years ago

2012/3/15 Trevor Wennblom trevor@well.com:

try compiling an updated version of trunk from this repository:
 svn co http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_9_3

It is ruby19_3 branch, not trunk.
For trunk,
svn co http://svn.ruby-lang.org/repos/ruby/trunk

--
NARUSE, Yui  naruse@airemix.jp

#12 Updated by Yui NARUSE about 2 years ago

  • Status changed from Feedback to Closed

#13 Updated by Nikolai Weibull about 2 years ago

On Thu, Mar 15, 2012 at 05:24, Nikolai Weibull now@bitwi.se wrote:

2012/3/15 U.Nakamura usa@garbagecollect.jp:

Hello,

In message " Re: [ruby-trunk - Bug #5486][Feedback] rb_stat() doesn’t respect input encoding"
   on Mar.13,2012 18:03:04, now@bitwi.se wrote:

Yes, it still occurs against trunk:

ruby 1.9.3dev (2011-09-13 revision 33263) [i386-mingw32]

It's not trunk...
It seems too old.

How can you say that?  I just tested it and got the same results.  I
showed you my version string above, that’s trunk.

Argh, sorry. I ran the test with the incorrect PATH, after all. Yes,
this issue has been resolved. You can close it.

Also available in: Atom PDF