Project

General

Profile

Actions

Bug #16683

closed

Regression in ruby 2.7 File.realpath return ASCII-8BIT for string literal with invalid characters instead of UTF-8

Added by lamont (Lamont Granquist) about 4 years ago. Updated about 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]
[ruby-core:97427]

Description

Real simple repro case:

[1] pry(main)> fp = File.open("/tmp/chef-test-\xFDmlaut", "w+")
Errno::EILSEQ: Illegal byte sequence @ rb_sysopen - /tmp/chef-test-�mlaut
from (pry):1:in `initialize'

This used to work in ruby < 2.6, setting Encoding.default_external and Encoding.default_internal to ASCII-8BIT does not help.

Seems to fail on every linux based O/S we test on along with MacOS at least.

This is a use case that we need to support in cases where ruby needs to run under UTF-8 but may need to manage files with names that were written in other encodings. The ruby application does not necessarily control or own these files or their names and cannot impose UTF-8 naming on the filesystem. The requirement is that the ruby application is able to read any arbitrary file on the filesystem written by any other application on the system, with any encoding set locally on that other application (where the ruby process has no a prior information about which application wrote the file, it just has the file). And even if the filename is just binary garbage the ruby application needs to be able to open it and analyze it.

Since this is getting thrown deep from rb_sysopen I'm not sure of any way to work around it from within ruby by finding an alternative API.

Updated by jeremyevans0 (Jeremy Evans) about 4 years ago

I can't recreate this failure on OpenBSD with 2.7.0 or the master branch, with or without a UTF-8 locale:

$ ruby28 -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.8.0dev (2020-03-09) [x86_64-openbsd6.6]
#<Encoding:US-ASCII>

$ ruby28 -Eutf-8 -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.8.0dev (2020-03-09) [x86_64-openbsd6.6]
#<Encoding:UTF-8>

$ ruby27 -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-openbsd]
#<Encoding:US-ASCII>

$ ruby27 -Eutf-8 -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-openbsd]
#<Encoding:UTF-8>

Also no errors when testing Ruby 2.7.0 on Windows 10:

C:\>C:\Ruby27-x64\bin\ruby -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x64-mingw32]
#<Encoding:IBM437>

C:\>C:\Ruby27-x64\bin\ruby -Eutf-8 -ve 'p Encoding.default_external; File.open("chef-test-\xFDmlaut", "w+")'
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x64-mingw32]
#<Encoding:UTF-8>

I would guess this issue is caused by the open(2) system call returning -1 and setting errno to EILSEQ. I don't think Ruby has any control over that. Maybe the /tmp filesystem in use doesn't support that filename (seems odd)?

Updated by lamont (Lamont Granquist) about 4 years ago

  • Subject changed from Regression in ruby 2.7 opening filenames that have non-UTF8 characters in them to Regression in ruby 2.7 File.realpath return ASCII-8BIT for string literal with invalid characters instead of UTF-8
  • ruby -v changed from ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-darwin18] to ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-linux]

on ruby 2.6.5 this is the behavior of File.realpath when fed a string with invalid bytes:

[2] pry(main)> File.realpath("/tmp/chef-test-\xFDmlaut").encoding
=> #<Encoding:UTF-8>

on ruby 2.7.0 this is the behavior:

[1] pry(main)> File.realpath("/tmp/chef-test-\xFDmlaut").encoding
=> #<Encoding:ASCII-8BIT>

This looks like it may be intentional?

Updated by lamont (Lamont Granquist) about 4 years ago

Hmm, I was trying to update the body of the bug report with that.

Turns out that File.open was a large wild goose chase and we appear to have other bugs that I wound up chasing down. The behavior on mac and linux also does appear to be different.

On Linux there seems to have been the above change in the behavior of File.realpath() which is the root of the real new behavior that we're observing.

Updated by jeremyevans0 (Jeremy Evans) about 4 years ago

  • Status changed from Open to Closed

lamont (Lamont Granquist) wrote in #note-2:

on ruby 2.6.5 this is the behavior of File.realpath when fed a string with invalid bytes:

[2] pry(main)> File.realpath("/tmp/chef-test-\xFDmlaut").encoding
=> #<Encoding:UTF-8>

on ruby 2.7.0 this is the behavior:

[1] pry(main)> File.realpath("/tmp/chef-test-\xFDmlaut").encoding
=> #<Encoding:ASCII-8BIT>

This looks like it may be intentional?

This is intentional. The native realpath(3) function is now used in most cases (#15797), and there is no guarantee what it returns is a valid UTF-8 string, so it is left as binary.

As the reported issue doesn't appear to be a bug in Ruby, I'm going to close this now. If you come across an issue that you think is a bug in Ruby, please file a separate issue report. Thanks!

Updated by lamont (Lamont Granquist) about 4 years ago

Yeah, it looks like we've just got some very sloppy code as I dig into it more.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0