Project

General

Profile

Actions

Feature #16978

open

Ruby should not use realpath for __FILE__

Added by vo.x (Vit Ondruch) over 3 years ago. Updated almost 2 years ago.

Status:
Open
Target version:
-
[ruby-core:98920]

Description

This is the simplest test case:

$ mkdir a

$ echo "puts __FILE__" > a/test.rb

$ ln -s a b

$ ruby -Ib -e "require 'test'"
/builddir/a/test.rb

This behavior is problematic, because Ruby should not know nothing about the a directory. It was not instructed to use it. I should always refer to the file using the original path and do not dig into the underlying details, otherwise depending on file system setup, one might be forced to used File.realpath everywhere trying to use __FILE__.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #10222: require_relative and require should be compatible with each other when symlinks are usedClosedActions

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Tracker changed from Bug to Feature
  • ruby -v deleted (ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux])
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)

I don't think this is a bug. __FILE__ is documented as follows: The path to the current file. Which path (real, absolute, relative, expanded) is not specified.

Not using the real path would lead to behavior that depends on the first path used when requiring the file.

a/test.rb (b symlinked to a):

def a
  __FILE__
end
ruby -Ia -Ib -rtest -e 'a'
# /path/to/a/test.rb
ruby -Ib -Ia -rtest -e 'a'
# Current: /path/to/a/test.rb
# Your proposed: /path/to/b/test.rb

What actually happens is not the file path being converted to a real path, but the include directory being converted to a real path before the file is required (in rb_construct_expanded_load_path). Changing this to not use a real path would probably break the code that checks that a feature hasn't been require twice. For example, this code would change behavior:

$: << 'a'
require 'test'
$: << 'b'
require 'test'
# Current: not loaded again
# Your proposed: loaded again

If you symlink the file itself and not the include directory, Ruby will attempt to require it as a separate feature.

Note that if you provide a path when requiring, Ruby already operates the way you want:

ruby -r./a/test -e 'a' # /path/to/a/test.rb
ruby -r./b/test -e 'a' # /path/to/b/test.rb

I can certainly see pros and cons from changing the behavior, but I would consider this a feature request and not a bug.

Updated by Eregon (Benoit Daloze) over 3 years ago

The "main file" (the file passed to ruby myfile.rb is also special in that __FILE__ and $0 can both be relative paths (basically the same path as passed on the command line).

Updated by vo.x (Vit Ondruch) over 3 years ago

this code would change behavior:

Absolutely, different $LOAD_PATH must result in different behavior.

$: << 'a'
require 'test'
$: << 'b'
require 'test'
# Current: not loaded again
# Your proposed: loaded again

This is actually where I am coming from. I am not 100 % sure what is the spec of require, but I would expect that require 'test' works just once and it does not matter what is the current $LOAD_PATH status. So my proposal is actually:

# Your proposed: not loaded again

Updated by vo.x (Vit Ondruch) about 3 years ago

Can this be resolved please? This is another scenario, which should work IMO, but it does not work:

$ mkdir a/
$ echo "require_relative 'b'" > a/test.rb
$ touch b.rb

$ ll
total 4
drwxrwxr-x. 1 vondruch vondruch 14 Jan 22 17:17 a
-rw-rw-r--. 1 vondruch vondruch  0 Jan 22 17:17 b.rb
lrwxrwxrwx. 1 vondruch vondruch  9 Jan 22 17:18 test.rb -> a/test.rb

$ ruby test.rb
Traceback (most recent call last):
	1: from test.rb:1:in `<main>'
test.rb:1:in `require_relative': cannot load such file -- /home/vondruch/tt/a/b (LoadError)

The point is that in the context of current directory, the b.rb is relative to test.rb and therefore this example should not work. It does not really matter that the test.rb is symlink. This behavior with requie_relative being more and more common basically prohibits usage of symlinks for Ruby code.

This is issue because gems typically does not ship their test suites. Therefore when testing packages in Fedora, we would like do something like:

$ gem unpack rspec-rails
Fetching rspec-rails-4.0.2.gem
Unpacked gem: '/home/vondruch/tt/rspec-rails-4.0.2'

$ cd rspec-rails-4.0.2/

# Link from git checkout or from separate archive
$ ln -s ~/projects/rspec/rspec-rails/spec/ .

This blows out as soon as require_relative is used in test suite, because it cannot refer back to require_relative "../lib/rspec_rails", which expands to /home/vondruch/projects/rspec/rspec-rails/spec/../lib/rspec_rails, where there is either loaded completely different file or nothing. In Fedora 1, the test suite is typically extracted from tarball, so this would crash.

Updated by akr (Akira Tanaka) about 3 years ago

I think we should use realpath.
The location of actual file is more robust.

Also, ELF dynamic linker has a feature, $ORIGIN, to refer a shared library using a relative path.
It also resolves symlinks.
https://refspecs.linuxbase.org/elf/gabi4+/ch5.dynamic.html

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

+1 for keeping the current behavior.

I remember in 1.8 we had so many problems with double-loading code because a file could be require'd with different paths. I have no wish to go back to that mess.

Updated by vo.x (Vit Ondruch) about 3 years ago

I have problem with double loading because of symlinks resolving into real path. If there was ensured that require "foo" load file just once, then double loading can't happen.

In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality. Unfortunately, since this commit 1, which introduces require_relative, we have issues with double loading. The simple reproducer is ruby --disable-gems -e 'require "openssl"; require "openssl/digest".

You can admit that Fedora is doing something unexpected, but I think that the package split was done prior the require_relative was even introduced.

Updated by austin (Austin Ziegler) about 3 years ago

Why not put some special handling in to require_relative such that it checks both the pre-realpath and realpath versions (perhaps behind a #define flow so that Fedora and other systems integrators can do this without impacting anyone else, assuming require_relative is in C) so that it can handle this case?

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

Interestingly, ruby does not use realpath for __FILE__, only for __dir__ and require_relative

$ cat test.rb
p __dir__ => __FILE__
require_relative "b"

$ cat a/b.rb
p __dir__ => __FILE__
require_relative "c"

$ ln -s a/b.rb

$ touch c.rb 

$ ruby test.rb
{"/home/dan42/16978"=>"test.rb"}
{"/home/dan42/16978/a"=>"/home/dan42/16978/b.rb"}
Traceback (most recent call last):
	3: from test.rb:2:in `<main>'
	2: from test.rb:2:in `require_relative'
	1: from /home/dan42/16978/b.rb:2:in `<top (required)>'
/home/dan42/16978/b.rb:2:in `require_relative': cannot load such file -- /home/dan42/16978/a/c (LoadError)

Updated by Eregon (Benoit Daloze) about 3 years ago

vo.x (Vit Ondruch) wrote in #note-7:

In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality.

What does it look like on the filesystem?

Even if the package is split, the standard location could be used, no need for symlinks, isn't it?

Updated by vo.x (Vit Ondruch) about 3 years ago

Eregon (Benoit Daloze) wrote in #note-10:

vo.x (Vit Ondruch) wrote in #note-7:

In Fedora, to avoid duplication, we have openssl gem extracted into independent package, which links back to the StdLib to keep the Ruby functionality.

What does it look like on the filesystem?

This is how it is constructed. Here is where the OpenSSL structure is created in traditional gem directories:

https://src.fedoraproject.org/rpms/ruby/blob/master/f/ruby.spec#_761

And here are the links back to the StdLib location:

https://src.fedoraproject.org/rpms/ruby/blob/master/f/ruby.spec#_771

Even if the package is split, the standard location could be used, no need for symlinks, isn't it?

Not sure what do you mean specifically, because of course I'd prefer if there are no symlinks. The idea is that rubygem-openssl, should be replaced by newer version of openssl released on rubygems.org, without upgrading Ruby itself, therefore the files have to be located in the gem directory. Of course there are different possibilities, how to workaround that (e.g. #14737). The only question how sustainable they are.

However, Fedora is orthogonal and just distraction to this issue, because I can imagine if I were ruby openssl developer, I'd like to replace content of Ruby build with my development version of OpenSSL from git snapshot via symlinks, e.g.:

$ cd ~
$ git clone https://github.com/ruby/openssl.git
# rm /usr/local/lib/ruby/3.0.0/openssl*
# cd /usr/local/lib/ruby/3.0.0
# ln -s ~/openssl/lib/openssl.rb .
# ln -s ~/openssl/lib/openssl .
$ /usr/local/bin/ruby --disable-gems -e 'require "openssl"; require "openssl/digest"'
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::MD4
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of MD4 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::MD5
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of MD5 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::RIPEMD160
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of RIPEMD160 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA1
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA1 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA224
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA224 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA256
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA256 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA384
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA384 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:45: warning: already initialized constant OpenSSL::Digest::SHA512
/builddir/openssl/lib/openssl/digest.rb:45: warning: previous definition of SHA512 was here
/usr/local/lib/ruby/3.0.0/openssl/digest.rb:52:in `<class:Digest>': superclass mismatch for class Digest (TypeError)
	from /usr/local/lib/ruby/3.0.0/openssl/digest.rb:16:in `<module:OpenSSL>'
	from /usr/local/lib/ruby/3.0.0/openssl/digest.rb:15:in `<top (required)>'
	from -e:1:in `require'
	from -e:1:in `<main>'

Everybody can try the example above. Is it artificial and should be prohibited? Should it fail?

Updated by Dan0042 (Daniel DeLorme) about 3 years ago

I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for __FILE__

So in the example above when you do require "openssl/digest", ruby will find /usr/local/lib/ruby/3.0.0/openssl/digest.rb and then it should check the realpath (~/openssl/lib/openssl/digest.rb) before loading the file.

Updated by vo.x (Vit Ondruch) about 3 years ago

Dan0042 (Daniel DeLorme) wrote in #note-12:

I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for __FILE__

This is though one thinking about this again, but you are probably right. Maybe the biggest concern is that the behavior is inconsistent. If the require and __FILE__ used realpath, I think the example in #16978-11 would work as you said.

Actions #14

Updated by mame (Yusuke Endoh) about 3 years ago

  • Related to Bug #10222: require_relative and require should be compatible with each other when symlinks are used added

Updated by ko1 (Koichi Sasada) about 3 years ago

  • Assignee set to nobu (Nobuyoshi Nakada)

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago

vo.x (Vit Ondruch) wrote in #note-13:

Dan0042 (Daniel DeLorme) wrote in #note-12:

I think this is a bug and should be fixed, but IMO the proper fix is to use realpath for __FILE__

This is though one thinking about this again, but you are probably right. Maybe the biggest concern is that the behavior is inconsistent. If the require and __FILE__ used realpath, I think the example in #16978-11 would work as you said.

It's not difficult to make __FILE__ use realpath:

@@ -10307,6 +10307,14 @@ numparam_nested_p(struct parser_params *p)
     return 0;
 }

+VALUE rb_realpath_internal(VALUE basedir, VALUE path, int strict);
+
+static VALUE
+rb_realpath__FILE__(VALUE file)
+{
+    return rb_realpath_internal(Qnil, file, 1);
+}
+
 static NODE*
 gettable(struct parser_params *p, ID id, const YYLTYPE *loc)
 {
@@ -10326,8 +10334,19 @@ gettable(struct parser_params *p, ID id, const YYLTYPE *loc)
            VALUE file = p->ruby_sourcefile_string;
            if (NIL_P(file))
                file = rb_str_new(0, 0);
-           else
-               file = rb_str_dup(file);
+           else if (compile_for_eval) {
+                file = rb_str_dup(file);
+            }
+           else {
+                int state;
+               VALUE realpath_file = rb_protect(rb_realpath__FILE__, file, &state);
+                if (state) {
+                    file = rb_str_dup(file);
+                }
+                else {
+                    file = realpath_file;
+                }
+            }
            node = NEW_STR(file, loc);
             RB_OBJ_WRITTEN(p->ast, Qnil, file);
        }

Unfortunately, if __FILE__ uses realpath, unless you also change the behavior of $0 (which I'm fairly sure would not be acceptable), you break the common idiom for executing code if the file is executed instead of required:

if __FILE__ == $0
  do_something
end

Updated by byroot (Jean Boussier) over 2 years ago

Unfortunately, if FILE uses realpath, unless you also change the behavior of $0

The "entrypoint" script could behave differently, but maybe that would be confusing.

Updated by Dan0042 (Daniel DeLorme) over 2 years ago

byroot (Jean Boussier) wrote in #note-17:

The "entrypoint" script could behave differently, but maybe that would be confusing.

I think that would be acceptable. After all the "entrypoint" script already behaves differently; it's the only one where __FILE__ is not always an absolute path.

Updated by byroot (Jean Boussier) over 2 years ago

This issue seems also somewhat related to another I opened a while ago: #17593

Updated by Dan0042 (Daniel DeLorme) over 2 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-16:

It's not difficult to make __FILE__ use realpath

I read this once more a bit more carefully and I think there's a misunderstanding... the idea here is that realpath should be used in require before checking if the file is already loaded (and then loading it). So p->ruby_sourcefile_string would already be the realpath.

Updated by jeremyevans0 (Jeremy Evans) over 2 years ago

Dan0042 (Daniel DeLorme) wrote in #note-20:

jeremyevans0 (Jeremy Evans) wrote in #note-16:

It's not difficult to make __FILE__ use realpath

I read this once more a bit more carefully and I think there's a misunderstanding... the idea here is that realpath should be used in require before checking if the file is already loaded (and then loading it). So p->ruby_sourcefile_string would already be the realpath.

I tried that approach first when working on #17885. It doesn't work because the loaded feature index is designed around load paths and not real paths. You would have to rewrite the loaded feature index and its interaction with autoload to switch to that approach.

Updated by vo.x (Vit Ondruch) over 2 years ago

I wonder what is the situation here given that the #17885 seems to be resolved?

Updated by retro (Josef Šimánek) almost 2 years ago

We have hit this problem as well in rubygems and it is really unexpected and confusing.

https://github.com/rubygems/rubygems/pull/5437

TL;DR; test passes calling rake test, but not using ruby -I... test/file.rb

Updated by retro (Josef Šimánek) almost 2 years ago

In the end RubyGems decided to leave __FILE__ due to this behaviour at all. If I understand Jeremy's original reasoning well, it was actually misused. Anyway it would be great to inline __FILE__ and __dir__ to provide consistent output. https://github.com/rubygems/rubygems/pull/5444

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0