Project

General

Profile

Actions

Feature #17593

open

load_iseq_eval should override the ISeq path

Added by byroot (Jean Boussier) 5 months ago. Updated 4 months ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:102310]

Description

Full context in https://github.com/Shopify/bootsnap/pull/343

Consider the following script

system('mkdir', '-p', '/tmp/build', '/tmp/app')
File.write('/tmp/app/a.rb', 'p ["app/a", __FILE__, __dir__]')
File.write('/tmp/app/b.rb', 'p ["app/b", __FILE__, __dir__]')
File.write('/tmp/build/a.rb', 'p ["build/a", __FILE__, __dir__]; require_relative "b"')

$iseq = RubyVM::InstructionSequence.compile_file('/tmp/build/a.rb')

class RubyVM::InstructionSequence
  def self.load_iseq(feature)
    if feature == "/tmp/app/a.rb"
      $iseq
    end
  end
end

require '/tmp/app/a.rb'

Current behavior:

["build/a", "/tmp/build/a.rb", "/private/tmp/build"]
/tmp/build/a.rb:1:in `require_relative': cannot load such file -- /private/tmp/build/b (LoadError)
    from /tmp/build/a.rb:1:in `<main>'
    from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
    from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
    from /tmp/iseq_debug.rb:16:in `<main>'

Expected behavior

["build/a", "/private/tmp/app/a.rb", "/private/tmp/app"]
["app/b", "/private/tmp/app/b.rb", "/private/tmp/app"]

What's going on?

RubyVM::InstructionSequence instances have a pathobj property that is recorded when the source is parsed, and when the ISeq is later evaled, the VM use that path as if you were loading a .rb file located at that path.

So if that source use constructs such as require_relative, __FILE__, __dir__, etc, they will all happen relative to where the source was located upon compilation, not relative to the source was upon evaluation.

Why is it a problem?

Some deployment strategies first build the application in one location, and then later move it elsewhere.

That's for instance the case on the Heroku platform. e.g. the deploy looks like

git clone <repo> /tmp/build_xxxx
cd /tmp/build_xxxx
rake assets:precompile ...
mv /tmp/build_xxxx /app

Because of this, all the ISeq cached by bootsnap when the code was in /tmp/build_xxx have to be invalidated as soon as the source is moved to /app, rendering ISeq caching ineffective, and even detrimental as it causes extra writes to disk without bringing any benefits.

Solution

I believe there are two changes that would be needed.

First I think that load_iseq_eval should set the fname as the top stack location. Either by copying the ISeq instance and change its pathobj, or by having a way to pass an optional path to vm_set_top_stack that would take precedence.

I experimented with a quick hack that changes the ISeq pathobj in place, and it does solve most of the problem.

However even with that quick hack another problem remain, the __FILE__ still evaluate to the original source location. However __dir__ works as expected, because it is a method that returns the top_stack location. I think __FILE__ could be changed to be a method as well.

Updated by ko1 (Koichi Sasada) 4 months ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Open to Assigned

Updated by ko1 (Koichi Sasada) 4 months ago

InstructionSequence.compile_file(..., filename: name) can solves it? It means making a ISeq with given name. However it doesn't solve the problem if the location name is not decided.

Updated by byroot (Jean Boussier) 4 months ago

InstructionSequence.compile_file(..., filename: name) can solves it?

Unfortunately no, as it would require to know where the file will be moved.

It would have to be InstructionSequence.load_from_binary(..., filename: name)

Updated by ko1 (Koichi Sasada) 4 months ago

It would have to be InstructionSequence.load_from_binary(..., filename: name)

__FILE__ will be replaced with a string given at compile time, so to make it we need to change the compilation strategy.
It increased the runtime overhead (because of indirection), but maybe it is no problem because I want to believe inner loop should not have __FILE__ access.

Updated by byroot (Jean Boussier) 4 months ago

FILE will be replaced with a string given at compile time

Yes, that is why I mentioned __FILE__ would need to be changed to behave like __dir__.

I want to believe inner loop should not have FILE access.

Yes, I have the same feeling. It would make it about 4X times slower, but ultimately if it's accessed repeatedly it's easy enough to cache it in a local variable.

Updated by ko1 (Koichi Sasada) 4 months ago

Matz accepted this proposal, so I'll try to reconsider the implementation.

Updated by byroot (Jean Boussier) 4 months ago

That's great to hear, let me know if I can help in any way.

Actions

Also available in: Atom PDF