Feature #17593
openload_iseq_eval should override the ISeq path
Description
Full context in https://github.com/Shopify/bootsnap/pull/343
Consider the following script
system('mkdir', '-p', '/tmp/build', '/tmp/app')
File.write('/tmp/app/a.rb', 'p ["app/a", __FILE__, __dir__]')
File.write('/tmp/app/b.rb', 'p ["app/b", __FILE__, __dir__]')
File.write('/tmp/build/a.rb', 'p ["build/a", __FILE__, __dir__]; require_relative "b"')
$iseq = RubyVM::InstructionSequence.compile_file('/tmp/build/a.rb')
class RubyVM::InstructionSequence
def self.load_iseq(feature)
if feature == "/tmp/app/a.rb"
$iseq
end
end
end
require '/tmp/app/a.rb'
Current behavior:
["build/a", "/tmp/build/a.rb", "/private/tmp/build"]
/tmp/build/a.rb:1:in `require_relative': cannot load such file -- /private/tmp/build/b (LoadError)
from /tmp/build/a.rb:1:in `<main>'
from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from <internal:/opt/rubies/3.0.0-pshopify2/lib/ruby/3.0.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
from /tmp/iseq_debug.rb:16:in `<main>'
Expected behavior
["build/a", "/private/tmp/app/a.rb", "/private/tmp/app"]
["app/b", "/private/tmp/app/b.rb", "/private/tmp/app"]
What's going on?¶
RubyVM::InstructionSequence
instances have a pathobj
property that is recorded when the source is parsed, and when the ISeq is later evaled, the VM use that path
as if you were loading a .rb
file located at that path.
So if that source use constructs such as require_relative
, __FILE__
, __dir__
, etc, they will all happen relative to where the source was located upon compilation, not relative to the source was upon evaluation.
Why is it a problem?¶
Some deployment strategies first build the application in one location, and then later move it elsewhere.
That's for instance the case on the Heroku platform. e.g. the deploy looks like
git clone <repo> /tmp/build_xxxx
cd /tmp/build_xxxx
rake assets:precompile ...
mv /tmp/build_xxxx /app
Because of this, all the ISeq cached by bootsnap when the code was in /tmp/build_xxx
have to be invalidated as soon as the source is moved to /app
, rendering ISeq caching ineffective, and even detrimental as it causes extra writes to disk without bringing any benefits.
Solution¶
I believe there are two changes that would be needed.
First I think that load_iseq_eval
should set the fname
as the top stack location. Either by copying the ISeq instance and change its pathobj
, or by having a way to pass an optional path to vm_set_top_stack
that would take precedence.
I experimented with a quick hack that changes the ISeq pathobj
in place, and it does solve most of the problem.
However even with that quick hack another problem remain, the __FILE__
still evaluate to the original source location. However __dir__
works as expected, because it is a method that returns the top_stack location. I think __FILE__
could be changed to be a method as well.
Updated by ko1 (Koichi Sasada) almost 4 years ago
- Status changed from Open to Assigned
- Assignee set to ko1 (Koichi Sasada)
Updated by ko1 (Koichi Sasada) almost 4 years ago
InstructionSequence.compile_file(..., filename: name)
can solves it? It means making a ISeq with given name. However it doesn't solve the problem if the location name is not decided.
Updated by byroot (Jean Boussier) almost 4 years ago
InstructionSequence.compile_file(..., filename: name)
can solves it?
Unfortunately no, as it would require to know where the file will be moved.
It would have to be InstructionSequence.load_from_binary(..., filename: name)
Updated by ko1 (Koichi Sasada) almost 4 years ago
It would have to be InstructionSequence.load_from_binary(..., filename: name)
__FILE__
will be replaced with a string given at compile time, so to make it we need to change the compilation strategy.
It increased the runtime overhead (because of indirection), but maybe it is no problem because I want to believe inner loop should not have __FILE__
access.
Updated by byroot (Jean Boussier) almost 4 years ago
FILE will be replaced with a string given at compile time
Yes, that is why I mentioned __FILE__
would need to be changed to behave like __dir__
.
I want to believe inner loop should not have FILE access.
Yes, I have the same feeling. It would make it about 4X times slower, but ultimately if it's accessed repeatedly it's easy enough to cache it in a local variable.
Updated by ko1 (Koichi Sasada) almost 4 years ago
Matz accepted this proposal, so I'll try to reconsider the implementation.
Updated by byroot (Jean Boussier) almost 4 years ago
That's great to hear, let me know if I can help in any way.