load_iseq_eval should override the ISeq path
Full context in https://github.com/Shopify/bootsnap/pull/343
Consider the following script
system('mkdir', '-p', '/tmp/build', '/tmp/app') File.write('/tmp/app/a.rb', 'p ["app/a", __FILE__, __dir__]') File.write('/tmp/app/b.rb', 'p ["app/b", __FILE__, __dir__]') File.write('/tmp/build/a.rb', 'p ["build/a", __FILE__, __dir__]; require_relative "b"') $iseq = RubyVM::InstructionSequence.compile_file('/tmp/build/a.rb') class RubyVM::InstructionSequence def self.load_iseq(feature) if feature == "/tmp/app/a.rb" $iseq end end end require '/tmp/app/a.rb'
["build/a", "/tmp/build/a.rb", "/private/tmp/build"] /tmp/build/a.rb:1:in `require_relative': cannot load such file -- /private/tmp/build/b (LoadError) from /tmp/build/a.rb:1:in `<main>' from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require' from /tmp/iseq_debug.rb:16:in `<main>'
["build/a", "/private/tmp/app/a.rb", "/private/tmp/app"] ["app/b", "/private/tmp/app/b.rb", "/private/tmp/app"]
What's going on?¶
RubyVM::InstructionSequence instances have a
pathobj property that is recorded when the source is parsed, and when the ISeq is later evaled, the VM use that
path as if you were loading a
.rb file located at that path.
So if that source use constructs such as
__dir__, etc, they will all happen relative to where the source was located upon compilation, not relative to the source was upon evaluation.
Why is it a problem?¶
Some deployment strategies first build the application in one location, and then later move it elsewhere.
That's for instance the case on the Heroku platform. e.g. the deploy looks like
git clone <repo> /tmp/build_xxxx cd /tmp/build_xxxx rake assets:precompile ... mv /tmp/build_xxxx /app
Because of this, all the ISeq cached by bootsnap when the code was in
/tmp/build_xxx have to be invalidated as soon as the source is moved to
/app, rendering ISeq caching ineffective, and even detrimental as it causes extra writes to disk without bringing any benefits.
I believe there are two changes that would be needed.
First I think that
load_iseq_eval should set the
fname as the top stack location. Either by copying the ISeq instance and change its
pathobj, or by having a way to pass an optional path to
vm_set_top_stack that would take precedence.
I experimented with a quick hack that changes the ISeq
pathobj in place, and it does solve most of the problem.
However even with that quick hack another problem remain, the
__FILE__ still evaluate to the original source location. However
__dir__ works as expected, because it is a method that returns the top_stack location. I think
__FILE__ could be changed to be a method as well.