Bug #19716
openSystemStackError occurs too easily on Alpine Linux (due to small stack size reported by pthread_attr_getstacksize on musl libc)
Description
This is the same problem previously reported against Ruby 2.5 in https://bugs.ruby-lang.org/issues/14387. I just ran into the same problem on Ruby 3.1.4, built on Alpine Linux 3.16.
@hsbt (Hiroshi SHIBATA) stated in the previous thread (https://bugs.ruby-lang.org/issues/14387#note-28):
If you have this issue with Ruby 3.2, please file it with another issue.
I hacked stack_check
in gc.c to print the values of STACK_START
and STACK_END
on stack overflow; on the Alpine 3.16 host where this problem just occurred, the values printed were:
Start=0x7ffd0bf4f000, End=0x7ffd0bf32530
...which shows that Ruby thinks the stack size is only 131072 bytes. On the other hand, ulimit -s
shows a stack size limit of 8192kb.
This Ruby 3.1.4 was built from unmodified source code downloaded from https://cache.ruby-lang.org; the build was configured using CFLAGS='-march=native' ./configure --disable-install-doc
.
The invocation of Ruby which blew the stack was bundle exec rake db:migrate
, on a mid-sized Rails project.
Regarding @ncopa's patch from #14387, @wanabe (_ wanabe) listed some things which should be done before it is merged into mainline Ruby:
Okay, The patch needs one or more proofs of its behaviour, like that:
Original issue [ruby-dev:50421] has gone away.
Standard test codes run well.
test-all
ruby/spec
getrlimit works on some situations like:
on single thread
with multiple threads
with RLIMIT_STACK environment variable
getrlimit code of musl is implemented correctly as expected.
(But It's doubtful whether it can be. I guess that a proof of code soundness is very difficult.)
Some "real world" applications can work.
I think it is better example that that application(s) can't work without the patch.
I am happy to help cover some of these points if the Ruby development team is still interested in merging @ncopa's patch.
Updated by alexdowad (Alex Dowad) 6 months ago
I just applied @ncopa's patch from: https://bugs.ruby-lang.org/attachments/download/7081/0001-thread_pthread.c-make-get_main_stack-portable-on-lin.patch
After make
and make install
, I am now able to run bundle exec rake db:migrate
normally.
Commands used were:
wget -O 'thread-stack-fix.patch' 'https://bugs.ruby-lang.org/attachments/download/7081/0001-thread_pthread.c-make-get_main_stack-portable-on-lin.patch'
patch -p1 -i thread-stack-fix.patch
rm thread-stack-fix.patch
make
make install
Updated by alexdowad (Alex Dowad) 6 months ago
Output from make test
after applying the patch:
Fiber count: 10000 (skipping)
PASS all 1669 tests
exec ./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems "./bootstraptest/runner.rb" --ruby="ruby --disable-gems" ./KNOWNBUGS.rb
2023-06-07 00:26:25 +0000
Driver is ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x86_64-linux-musl]
Target is ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x86_64-linux-musl]
KNOWNBUGS.rb PASS 0
No tests, no problem
test succeeded
Updated by retro (Josef Šimánek) 19 days ago
It would be great to get this fixed finally. There are no bugs reported, since Alpine linux is used mostly as Docker container system and official Ruby alpine image already includes the patch https://github.com/docker-library/ruby/blob/31c1fdba369192fe2c3cf327d7d98819edc1400b/3.2/alpine3.18/Dockerfile#L102-L106. RubyGems.org runs on this Alpine image for last few years already with this patch.
I tried this patch on Ruby 3.2.2 on GLIBC Linux (Fedora).
$ make test-all
...
Finished tests in 957.018822s, 24.4938 tests/s, 5851.8306 assertions/s.
23441 tests, 5600312 assertions, 0 failures, 0 errors, 89 skips
ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]