Bug #19254
closedEnabling YJIT configuration option breaks rspec-core test suite
Description
In preparation for Ruby 3.2, we have enabled YJIT in Fedora:
Since that moment, rspec-core test suite started to fail (see the attached log for all details):
... snip ...
1) RSpec::Core::Example#run memory leaks, see GH-321, GH-1921 releases references to the examples / their ivars
Failure/Error: expect(get_all.call).to eq opts.fetch(:post_gc)
expected: []
got: ["after_all", "before_all"]
(compared using ==)
# ./spec/rspec/core/example_spec.rb:469:in `expect_gc'
# ./spec/rspec/core/example_spec.rb:492:in `block (4 levels) in <top (required)>'
# ./spec/support/sandboxing.rb:16:in `block (3 levels) in <top (required)>'
# ./spec/support/sandboxing.rb:7:in `block (2 levels) in <top (required)>'
Finished in 8.98 seconds (files took 0.47612 seconds to load)
2209 examples, 1 failure, 4 pending
Please note that the YJIT was not enabled during runtime, just the support was enabled. Disabling the YJIT supports makes the test case pass.
Files
Updated by mame (Yusuke Endoh) over 2 years ago
You mean this test?
Frankly speaking, this test appears to be completely wrong. MRI's GC is not exact (in their terms, not reliable).
Updated by k0kubun (Takashi Kokubun) over 2 years ago
- Status changed from Open to Feedback
In addition to @mame (Yusuke Endoh) 's point, can you report how to reproduce the issue by building Ruby from a source or a tarball? e.g.
$ git clone --depth=1 https://github.com/ruby/ruby
$ cd ruby
$ ./autogen.sh
$ ./configure --enable-yjit --prefix="/opt/rubies/ruby" && make -j8 && make install
$ git clone --depth=1 https://github.com/rspec/rspec-core
$ cd rspec-core
$ unset GEM_ROOT GEM_HOME GEM_PATH
$ export PATH="/opt/rubies/ruby/bin:${PATH}"
$ bundle install
$ bundle exec rspec spec/rspec/core/example_spec.rb
And it doesn't reproduce any problem.
$ cd rspec-core
$ git rev-parse HEAD
522b7727d02d9648c090b56fa68bbdc18a21c04d
$ ruby -v -e "p RbConfig::CONFIG['YJIT_SUPPORT']"
ruby 3.2.0dev (2022-12-23T17:24:55Z master ee60756495) [x86_64-linux]
"yes"
$ RUBYOPT=-v bundle exec rspec spec/rspec/core/example_spec.rb:472
ruby 3.2.0dev (2022-12-23T17:24:55Z master ee60756495) [x86_64-linux]
Run options:
include {:locations=>{"./spec/rspec/core/example_spec.rb"=>[472]}}
exclude {:ruby=>#<Proc: ./spec/spec_helper.rb:110>}
Randomized with seed 37258
RSpec::Core::Example
#run
memory leaks, see GH-321, GH-1921
releases references to the examples / their ivars
Finished in 0.0101 seconds (files took 0.09802 seconds to load)
1 example, 0 failures
Randomized with seed 37258
Updated by vo.x (Vit Ondruch) over 2 years ago
mame (Yusuke Endoh) wrote in #note-1:
You mean this test?
Yes, sorry, forgot to attach the link.
k0kubun (Takashi Kokubun) wrote in #note-2:
In addition to @mame (Yusuke Endoh) 's point, can you report how to reproduce the issue by building Ruby from a source or a tarball?
The build was done via RPMs. Ruby was built form tarball. Here is the full Ruby build log:
Working on #19248, I suspect that some of the compiler options might help to reproduce this.
Updated by mtasaka (Mamoru TASAKA) over 2 years ago
Looks like adding %global _lto_cflags %{nil}
to ruby.spec, i.e. removing -flto=auto -ffat-lto-objects
from compilation flag makes the above rspec-core test pass (note that Fedora ruby is using gcc).
So maybe LTO is doing "something" with yjit.
Updated by vo.x (Vit Ondruch) over 2 years ago
k0kubun (Takashi Kokubun) wrote in #note-2:
$ RUBYOPT=-v bundle exec rspec spec/rspec/core/example_spec.rb:472
I have not hit the issue trying to run just this minimal example
Updated by alanwu (Alan Wu) over 2 years ago
I agree with mame that the test is highly questionable.
The GC does not guarantee collection for all semantically unreachable objects since it's not exact.
Because we scan the native stack for conservative marking, changes in code generation could
spill different objects to the native stack and keep them alive. This is probably what we're seeing
through the combination of building YJIT + LTO, but not enabling YJIT at runtime.
We could take a heap dump (ObjectSpace.dump_all
) and verify that indeed the objects are kept alive through the machine context, but beyond that, I don't think there is much to do here.
Updated by hsbt (Hiroshi SHIBATA) over 2 years ago
- Status changed from Feedback to Third Party's Issue