Segmentation Fault on Ruby 2.3.1
We have an extensive test matrix that tests many versions of Rails over many versions of Ruby. We notice frequent segfaults when running our Rails 3.1 suite with Ruby 2.3.1p112. We run the same suite with Ruby 1.8 - Ruby 2.2 without issues. It appears that our test suite is exposing a subtle bug in Ruby that I suspect that it is related to garbage collection. The segfault does not occur everytime the test suite runs, but if I run the suite in a bash for loop for 100 iterations I usually see 5-10 segfaults. If I turn off garbage collection during the test cases the segfault does not occur over 100 iterations of the test suite.
I tried, but was unable to come up with a minimal reproduction case. Our code and test suite are publically available and I'll list the steps you can follow to reproduce the issue with some additional setup.
- Clone the newrelic rpm github repo from: https://github.com/newrelic/rpm
- git checkout 18.104.22.1688
- bundle install
- run the test suite a single time (you probably won't see a segfault): bundle exec rake test:multiverse[rails,env=6]
- run the test suite in a bash for loop for 100 iterations: for i in `seq 1 100`; do bundle exec rake test:multiverse[rails,env=6]; done
id_table.c: extend, don't shrink
- id_table.c (hash_table_extend): should not shrink the table than the previous capacity. [ruby-core:76534] [Bug #12614]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55896 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
merge revision(s) 55896: [Backport #12614]
* id_table.c (hash_table_extend): should not shrink the table than the previous capacity. [ruby-core:76534] [Bug #12614]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_3@56019 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Updated by wanabe (_ wanabe) over 3 years ago
- File segv_without_newrelic.rb segv_without_newrelic.rb added
- File small_test_newrelic_rpm.patch small_test_newrelic_rpm.patch added
Reproduced with trunk
ruby 2.4.0dev (2016-08-14 trunk 55893) [x86_64-linux].
I guess it is related to method table and its extending logic.
hash_table_extend(tbl) can make
LIST_P(tbl->capa) == TRUE when there are many collision and deleted items.
In the case,
tbl->used is large but
tbl->num is not.
I attach 2 files.
- A patch to newrelic_rpm to shrink test code
- Test code without newrelic
- I'm sorry that I don't know what magic numbers mean.