Project

General

Profile

Actions

Bug #15993

open

'require' doesn't work if there are Cyrillic chars in the path to Ruby dir

Added by inversion (Yura Babak) over 5 years ago. Updated over 3 years ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
3.0.1p64 (2021-04-05 revision 0fb782ee38) [x64-mingw32]
[ruby-core:93655]

Description

I’m trying to build a cross-platform portable application with Ruby onboard and there is a problem on Windows.
A user usually installs it to the Roaming folder which sits inside a user folder which can often have not a Latin name or contain spaces).
When there is a Cyrillic character (maybe just not Latin) in the path — require of any gem doesn’t work:

D:\users\киї\Ruby\2.6\bin>ruby -v
ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]

D:\users\киї\Ruby\2.6\bin>ruby -e "require 'logger'"
Traceback (most recent call last):
        1: from <internal:gem_prelude>:2:in `<internal:gem_prelude>'
<internal:gem_prelude>:2:in `require': No such file or directory -- D:/users/РєРёС—/Ruby/2.6/lib/ruby/2.6.0/rubygems.rb (LoadError)

D:\users\киї\Ruby\2.6\bin>ruby --disable=rubyopt -e "require 'logger'"
Traceback (most recent call last):
        1: from <internal:gem_prelude>:2:in `<internal:gem_prelude>'
<internal:gem_prelude>:2:in `require': No such file or directory -- D:/users/РєРёС—/Ruby/2.6/lib/ruby/2.6.0/rubygems.rb (LoadError)

D:\users\киї\Ruby\2.6\bin>gem list
Traceback (most recent call last):
        1: from <internal:gem_prelude>:2:in `<internal:gem_prelude>'
<internal:gem_prelude>:2:in `require': No such file or directory -- D:/users/РєРёС—/Ruby/2.6/lib/ruby/2.6.0/rubygems.rb (LoadError)

We can see such encoding transformations in the output:

киї (utf-8) == РєРёС— (win1251)

I have an old Ruby installation that works fine:

D:\users\киї\Ruby\2.0\bin>ruby -e "require 'logger'"

D:\users\киї\Ruby\2.0\bin>ruby -v
ruby 2.0.0p451 (2014-02-24) [i386-mingw32]

The same is for ruby 2.0.0p643 (2015-02-25) [i386-mingw32] .

I also checked that require fails in the same case for
ruby 2.1.9p490 (2016-03-30 revision 54437) [i386-mingw32]


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #15655: Unable to handle Russian dirname on WindowsClosedActions

Updated by inversion (Yura Babak) over 5 years ago

Looks like there is an ugly workaround.

  1. Ensure to do chcp 1251 in the current console session.
  2. Run Ruby with an option --disable=gems so it will not fail initially.
  3. Add next code at the very beginning of a script:
if $:[0].encoding.name == 'Windows-1251'
	$:.each {|path| path.encode! 'UTF-8' }
	$:.push '.'    # somehow it helps, looks like a modification of array is needed
	require 'rubygems'
end

This helped me to overcome the problem and run my script from a folder with Cyrillic and spaces in the path.

But it definitely should be fixed.

Updated by duerst (Martin Dürst) over 5 years ago

@ko1 (Koichi Sasada): I can check whether this bug is reproducible. But I'm not too familiar with how Ruby deals with the Windows file system. So I'm not confident I will be able to find and fix this bug.

Updated by MSP-Greg (Greg L) over 5 years ago

On a US Windows system, I used a base Ruby folder of C:\Greg\Ruby киї (using a space and Cyrillic characters), and I could repo the issue.

Without any console chcp command, I did the following, which also solved the issue:

# start ruby with --disable=gems
$:.map! { |path| path.dup.force_encoding 'UTF-8' }
require 'rubygems'

require 'openssl'
puts OpenSSL::VERSION

I don't think spaces in Windows paths is an issue anymore, but I haven't rigorously checked...

Updated by MSP-Greg (Greg L) over 5 years ago

While taking a break, looked at this again. Below is the encoding of various items:

$LOAD_PATH
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/site_ruby/2.7.0
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/site_ruby/2.7.0/x64-msvcrt
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/site_ruby
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/vendor_ruby/2.7.0
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/vendor_ruby/2.7.0/x64-msvcrt
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/vendor_ruby
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/2.7.0
ASCII-8BIT      C:/Greg/Ruby киї/lib/ruby/2.7.0/x64-mingw32

IBM437          __FILE__
IBM437          __dir__
UTF-8           Dir.pwd

The encoding wasn't affected by using -E in RUBYOPT.

Tested using today's trunk.

Actions #5

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

  • Related to Bug #15655: Unable to handle Russian dirname on Windows added

Updated by tschoening (Thorsten Schöning) about 4 years ago

I think I have a similar problem originally reported at GitHub already:

https://github.com/rubygems/rubygems/issues/3853

I have a Ruby-based shell application which needs to require a library during startup. I'm using the following command line:

"..\ruby\bin\ruby.exe" "-I../runtime/lib" "../visualizer/bin/ksv" "--require=de/[...]/par_opp_dispatcher.rb" "--opaque-types=true" "../files_to_show/recs_clt.bin" "de/[...]/par_recs_clt.rb"

This results in the following error, while the first line describes the current directory I'm in. It contains some German umlaut ü. Using an ASCII-only path, things work as expected.

C:\[...]\Müller electronic\[...]\ks_ruby_visualizer>show.cmd
Traceback (most recent call last):
        1: from <internal:gem_prelude>:2:in `<internal:gem_prelude>'
<internal:gem_prelude>:2:in `require': No such file or directory -- C:/[...]/Müller electronic/[...]/rubygems.rb (LoadError)

The problem seems to be that at some point Ruby really seems to forward UTF-8 encoded bytes into the file system and such a path simply doesn't exist. The interesting thing is that many times before the path is forwarded correctly according to the following ProcMon-log:

18:57:48,7938985	ruby.exe	15296	CreateFile	C:\[...]\Müller electronic\[...]\rubygems.rb	SUCCESS	Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
18:57:48,7940217	ruby.exe	15296	QueryBasicInformationFile	C:\[...]\Müller electronic\[...]\rubygems.rb	SUCCESS	CreationTime: 24.07.2020 14:48:44, LastAccessTime: 24.07.2020 14:48:44, LastWriteTime: 01.10.2019 23:01:05, ChangeTime: 04.02.2020 22:30:28, FileAttributes: A 0x80000
18:57:48,7940500	ruby.exe	15296	CloseFile	C:\[...]\Müller electronic\[...]\rubygems.rb	SUCCESS	
18:57:48,7942644	ruby.exe	15296	CreateFile	C:\[...]\Müller electronic\[...]\rubygems.rb	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
18:57:48,7943188	ruby.exe	15296	CloseFile	C:\[...]\Müller electronic\[...]\rubygems.rb	SUCCESS	
18:57:48,7945545	ruby.exe	15296	CreateFile	C:\[...]\Müller electronic\[...]\rubygems.rb	PATH NOT FOUND	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a

Clipboard01

Here are my current environment details:

$ gem env version
3.0.3
  • Windows 10 1909 x86-64
  • default codepages Windows-1252 and CP-850
  • Ruby 2.6.5

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Status changed from Open to Closed

This appears to be fixed starting in Ruby 2.7 (also works in 3.0):

D:\Евгений>C:\Ruby26-x64\bin\ruby -I D:\Евгений -e "require 'logger'"
Traceback (most recent call last):
        2: from -e:1:in `<main>'
        1: from C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
C:/Ruby26-x64/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require': No such file or directory -- D:/Евгений/logger.rb (LoadError)

D:\Евгений>C:\Ruby27-x64\bin\ruby -I D:\Евгений -e "require 'logger'"

D:\Евгений>C:\Ruby30-x64\bin\ruby -I D:\Евгений -e "require 'logger'"

As Ruby 2.6 is in security maintenance mode, the change will not be backported.

Updated by inversion (Yura Babak) over 3 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-7:

This appears to be fixed starting in Ruby 2.7 (also works in 3.0):

Still, there is a problem.
require 'bundler/setup' fails if LOAD_PATH or Gem.dir contain Cyrillic chars, the error is similar to:

incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)

From the trace I have prepared the minimum reproducible case :

  1. Put Ruby in a location where the path will contain Cyrillic chars, like "D:\users\киї\Ruby"
  2. Prepare 2 files (saved in UTF-8 encoding) somewhere in a location where the path will contain Cyrillic chars (can be near that Ruby):
    https://gist.github.com/Inversion-des/75949795cc5be707c19d31901e79d1cf
  3. Open cmd and ensure to do chcp 1251 in the current console session.
  4. run "[this Ruby path]" f1.rb

You will see that the same __dir__ output is different between files (f2 is required). If you will try to run f2.rb — output will be the same as for f1. So, require_relative somehow changes the encoding here.

To emulate problems with the 'bundler/setup' there are next lines:

# fails: incompatible character encodings: Windows-1251 and UTF-8 (Encoding::CompatibilityError)
p start_with:$LOAD_PATH[0].start_with?(__dir__)
# fails: incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
p start_with:$LOAD_PATH[0].start_with?(Gem.dir)

To see the real problem you should comment these lines and also prepare next files (I'm not sure content is important by add at least one gem there)

  • Gemfile
  • Gemfile.lock

And to see both problems there should also be the .bundle\config file with a line like:
BUNDLE_PATH: "../platform/Ruby_gems"

In the bundler\settings.rb it will use explicit_path if the BUNDLE_PATH defined and Bundler.rubygems.gem_dir otherwise.

Workaround to overcome both errors you can find in the f1.rb in the related commented section:

Gem.dir.force_encoding 'UTF-8'
Gem.path.each {|path| path.force_encoding 'UTF-8' }
if $:[0].encoding.name == 'Windows-1251'
    $:.each {|path| path.encode! 'UTF-8' }
    $:.push '.'    # somehow it helps, looks like a modification of array is needed
end

My environment:

  • Windows10 Pro
  • Ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x64-mingw32]
  • Bundler version 2.2.22
  • RubyGems version 3.2.22

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Status changed from Closed to Open

inversion (Yura Babak) wrote in #note-8:

jeremyevans0 (Jeremy Evans) wrote in #note-7:

This appears to be fixed starting in Ruby 2.7 (also works in 3.0):

Still, there is a problem.
require 'bundler/setup' fails if LOAD_PATH or Gem.dir contain Cyrillic chars, the error is similar to:

incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)

From the trace I have prepared the minimum reproducible case :

  1. Put Ruby in a location where the path will contain Cyrillic chars, like "D:\users\киї\Ruby"
  2. Prepare 2 files (saved in UTF-8 encoding) somewhere in a location where the path will contain Cyrillic chars (can be near that Ruby):
    https://gist.github.com/Inversion-des/75949795cc5be707c19d31901e79d1cf
  3. Open cmd and ensure to do chcp 1251 in the current console session.
  4. run "[this Ruby path]" f1.rb

I was able to reproduce the issue, but only when I installed Ruby into a path not supported by the Windows-1251 encoding:

d:\Евгений>d:\zz-können2\Ruby31-x64\bin\bundle install --local
d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:94:in `expand_path': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:94:in `expand_path'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:94:in `bundle_path'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:682:in `configure_gem_home'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:663:in `configure_gem_home_and_path'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:80:in `configure'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler.rb:193:in `definition'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/cli/install.rb:57:in `run'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/cli.rb:259:in `block in install'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/settings.rb:133:in `temporary'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/cli.rb:258:in `install'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/cli.rb:30:in `dispatch'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/cli.rb:24:in `start'
        from d:/zz-können2/Ruby31-x64/lib/ruby/gems/3.1.0/gems/bundler-2.3.0.dev/libexec/bundle:49:in `block in <top (required)>'
        from d:/zz-können2/Ruby31-x64/lib/ruby/3.1.0/bundler/friendly_errors.rb:130:in `with_friendly_errors'
        from d:/zz-können2/Ruby31-x64/lib/ruby/gems/3.1.0/gems/bundler-2.3.0.dev/libexec/bundle:37:in `<top (required)>'
        from d:/zz-k?nnen2/Ruby31-x64/bin/bundle:31:in `load'
        from d:/zz-k?nnen2/Ruby31-x64/bin/bundle:31:in `<main>'

Part of the underlying issue seems to be that __FILE__ and __dir__ are not UTF-8 encoded for the main script, unlike required files. I'm not sure if changing that alone will fix the issue, though.

When I run the following script (f3.rb):

p ['__FILE__', __FILE__, __FILE__.encoding]
p ['__dir__', __dir__, __dir__.encoding]
p ['Gem.dir', Gem.dir, Gem.dir.encoding]
puts 'Gem.path'
Gem.path.each do |s|
  p [s, s.encoding]
end
puts '$:'
$:.each do |s|
  p [s, s.encoding]
end

I get the following when using Ruby installed in a non-ASCII path:

d:\Евгений>d:\zz-können2\Ruby31-x64\bin\ruby D:\Евгений\f3.rb
["__FILE__", "D:/\xC5\xE2\xE3\xE5\xED\xE8\xE9/f3.rb", #<Encoding:Windows-1251>]
["__dir__", "D:/\xC5\xE2\xE3\xE5\xED\xE8\xE9", #<Encoding:Windows-1251>]
["Gem.dir", "d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/gems/3.1.0", #<Encoding:ASCII-8BIT>]
Gem.path
["C:/Users/jeremye/.gem/ruby/3.1.0", #<Encoding:UTF-8>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/gems/3.1.0", #<Encoding:ASCII-8BIT>]
$:
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby/3.1.0/x64-ucrt", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0/x64-ucrt", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/3.1.0/x64-mingw-ucrt", #<Encoding:ASCII-8BIT>]

and when installed into an ASCII path:

d:\Евгений>C:\Ruby30-x64\bin\ruby d:\Евгений\f3.rb
["__FILE__", "d:/\xC5\xE2\xE3\xE5\xED\xE8\xE9/f3.rb", #<Encoding:Windows-1251>]
["__dir__", "d:/\xC5\xE2\xE3\xE5\xED\xE8\xE9", #<Encoding:Windows-1251>]
["Gem.dir", "C:/Ruby30-x64/lib/ruby/gems/3.0.0", #<Encoding:ASCII-8BIT>]
Gem.path
["C:/Users/jeremye/.gem/ruby/3.0.0", #<Encoding:UTF-8>]
["C:/Ruby30-x64/lib/ruby/gems/3.0.0", #<Encoding:ASCII-8BIT>]
$:
["C:/Ruby30-x64/lib/ruby/site_ruby/3.0.0", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/site_ruby/3.0.0/x64-msvcrt", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/site_ruby", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/vendor_ruby/3.0.0", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/vendor_ruby/3.0.0/x64-msvcrt", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/vendor_ruby", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/3.0.0", #<Encoding:Windows-1251>]
["C:/Ruby30-x64/lib/ruby/3.0.0/x64-mingw32", #<Encoding:Windows-1251>]

It looks like the difference in the non-ASCII path case is that ASCII-8BIT encoding is used even if the path itself is valid UTF-8. This is true even if you force a UTF-8 code page (though that does fix __FILE__ and __dir__):

d:\Евгений>chcp 65001
Active code page: 65001

d:\Евгений>d:\zz-können2\Ruby31-x64\bin\ruby D:\Евгений\f3.rb
["__FILE__", "D:/Евгений/f3.rb", #<Encoding:UTF-8>]
["__dir__", "D:/Евгений", #<Encoding:UTF-8>]
["Gem.dir", "d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/gems/3.1.0", #<Encoding:ASCII-8BIT>]
Gem.path
["C:/Users/jeremye/.gem/ruby/3.1.0", #<Encoding:UTF-8>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/gems/3.1.0", #<Encoding:ASCII-8BIT>]
$:
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby/3.1.0/x64-ucrt", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/site_ruby", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby/3.1.0/x64-ucrt", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/vendor_ruby", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/3.1.0", #<Encoding:ASCII-8BIT>]
["d:/zz-k\xC3\xB6nnen2/Ruby31-x64/lib/ruby/3.1.0/x64-mingw-ucrt", #<Encoding:ASCII-8BIT>]

Since there does appear to be an issue, I'll reopen this. Hopefully someone with more knowledge in this area can suggest a possible fix.

Actions #10

Updated by inversion (Yura Babak) over 3 years ago

  • ruby -v changed from ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32] to 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x64-mingw32]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0