Project

General

Profile

Bug #14387

Ruby 2.5 を Alpine Linux で実行すると比較的浅めで SystemStackError 例外になる

Added by koshigoe (Masataka SUZUKI) 9 months ago. Updated 4 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
[ruby-dev:50421]

Description

CircleCI で Alpine Linux を使って Ruby 2.5.0 で Rubocop を実行した時に遭遇した例外です(Ruby 2.4.3 では発生しませんでした)。

Ruby のバージョンによって、再帰が止められるまでの回数に大きな違いがあるのはなぜでしょうか?
これは、意図された挙動なのか、Ruby の変更によるものでは無く Alpine Linux 固有の問題なのか、教えていただく事は可能でしょうか?

Alpine Linux の Tread stack size が比較的小さい事で、Ruby 2.5.0 からこのような挙動になったのでしょうか?
https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

再現

問題の再現のため、以下の様な再帰するコードを実行します。

# test.rb
n = 100000
res = {}
1.upto(n).to_a.inject(res) do |r, i|
  r[i] = {}
end

def f(x)
  x.each_value { |v| f(v) }
end

f(res)

Ruby 2.4.3 で実行した場合、 10061 levels で例外があがりました。

% docker container run \
  -v (pwd):/mnt/my --rm \
  ruby:2.4.3-alpine3.7 \
  ruby -v /mnt/my/test.rb
ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-linux-musl]
/mnt/my/test.rb:9:in `each_value': stack level too deep (SystemStackError)
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
         ... 10061 levels...
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:12:in `<main>'

一方で Ruby 2.5.0 で実行した場合、 134 level で例外があがりました。

% docker container run \
  -v (pwd):/mnt/my --rm \
  test/ruby:trunk-alpine3.7 \
  ruby -v /mnt/my/test.rb
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
/mnt/my/test.rb:9:in `each_value': stack level too deep (SystemStackError)
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
         ... 134 levels...
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:12:in `<main>'

また、Ruby trunk で実行した場合は 2.5.0 同等の結果になりました。

ruby 2.6.0dev (2018-01-24 trunk 62017) [x86_64-linux-musl]
/mnt/my/test.rb:9:in `each_value': stack level too deep (SystemStackError)
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:9:in `block in f'
         ... 134 levels...
        from /mnt/my/test.rb:9:in `block in f'
        from /mnt/my/test.rb:9:in `each_value'
        from /mnt/my/test.rb:9:in `f'
        from /mnt/my/test.rb:12:in `<main>'

※ trunk の Docker イメージを作った際の Dockerfile は以下。
https://gist.github.com/koshigoe/509be02a3580cdfc7a2cc45a4e6e44c5


Related issues

Related to Ruby trunk - Bug #13412: Infinite recursion with define_method may cause silent SEGV or cfp consistency errorClosed
Related to Ruby trunk - Bug #9454: The define_method(:class) segfaultClosed2014-01-26

History

#1 [ruby-dev:50433] Updated by wanabe (_ wanabe) 9 months ago

koshigoe (Masataka SUZUKI) wrote:

Ruby のバージョンによって、再帰が止められるまでの回数に大きな違いがあるのはなぜでしょうか?
これは、意図された挙動なのか、Ruby の変更によるものでは無く Alpine Linux 固有の問題なのか、教えていただく事は可能でしょうか?

Alpine Linux の Tread stack size が比較的小さい事で、Ruby 2.5.0 からこのような挙動になったのでしょうか?
https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

git bisect で確認したところ、 r59630 以降に今の挙動になっているようでした。
r59630 は [Bug #13412] に関連したコミットで、差分を見ると stack_check(th) というところから、スレットのスタックをチェックする処理が追加されていることがわかります。

ここから推測すると、おっしゃっている通り Alpine Linux で、あるいは musl で特有に見られる問題なのではないかと思われます。
意図的な挙動か、チェック内容を緩めるべきかどうか、pthread_attr_setstacksize やその他の方法で回避できるかまたそうするべきか、といったあたりはわかりませんでした。

#2 Updated by wanabe (_ wanabe) 9 months ago

  • Related to Bug #13412: Infinite recursion with define_method may cause silent SEGV or cfp consistency error added

#3 [ruby-dev:50435] Updated by scardon (Daniel Leong) 9 months ago

https://qiita.com/koshigoe/items/7acebbab7b44fa2b35bc

This is the post that spin off this issue ticket.
There are more information in there.

Hope we could get some form of resolution for Alpine users.

#4 [ruby-dev:50436] Updated by mame (Yusuke Endoh) 9 months ago

環境変数 RUBY_THREAD_MACHINE_STACK_SIZE に 1048576 とか大きい値を設定してみたら動きますか?
これで pthread_attr_setstacksize に渡すスタックサイズを調整できます。

なお、この設定は RubyVM::DEFAULT_PARAMS で読み出すことができます。

$ ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
1048576

$ RUBY_THREAD_MACHINE_STACK_SIZE=80000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
131072

#5 [ruby-dev:50444] Updated by craigjbass (Craig Bass) 9 months ago

wanabe (_ wanabe) wrote:

koshigoe (Masataka SUZUKI) wrote:

Ruby のバージョンによって、再帰が止められるまでの回数に大きな違いがあるのはなぜでしょうか?
これは、意図された挙動なのか、Ruby の変更によるものでは無く Alpine Linux 固有の問題なのか、教えていただく事は可能でしょうか?

Alpine Linux の Tread stack size が比較的小さい事で、Ruby 2.5.0 からこのような挙動になったのでしょうか?
https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size

git bisect で確認したところ、 r59630 以降に今の挙動になっているようでした。
r59630 は [Bug #13412] に関連したコミットで、差分を見ると stack_check(th) というところから、スレットのスタックをチェックする処理が追加されていることがわかります。

ここから推測すると、おっしゃっている通り Alpine Linux で、あるいは musl で特有に見られる問題なのではないかと思われます。
意図的な挙動か、チェック内容を緩めるべきかどうか、pthread_attr_setstacksize やその他の方法で回避できるかまたそうするべきか、といったあたりはわかりませんでした。

I've also come across this issue: https://github.com/rspec/rspec-support/pull/343

mame (Yusuke Endoh) wrote:

環境変数 RUBY_THREAD_MACHINE_STACK_SIZE に 1048576 とか大きい値を設定してみたら動きますか?
これで pthread_attr_setstacksize に渡すスタックサイズを調整できます。

なお、この設定は RubyVM::DEFAULT_PARAMS で読み出すことができます。

$ ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
1048576

$ RUBY_THREAD_MACHINE_STACK_SIZE=80000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
131072

#6 [ruby-dev:50451] Updated by scardon (Daniel Leong) 9 months ago

/app # ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
1048576
/app # RUBY_THREAD_MACHINE_STACK_SIZE=100000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
131072
/app # RUBY_THREAD_MACHINE_STACK_SIZE=500000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
503808
/app # RUBY_THREAD_MACHINE_STACK_SIZE=100000 rubycritic -f html --no-browser app lib
RubyCritic can provide more feedback if you use a Git, Mercurial or Perforce repository. Churn will not be calculated.
Traceback (most recent call last):
    165: from /usr/local/bundle/bin/rubycritic:23:in `<main>'
    164: from /usr/local/bundle/bin/rubycritic:23:in `load'
    163: from /usr/local/bundle/gems/rubycritic-3.3.0/bin/rubycritic:10:in `<top (required)>'
    162: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/cli/application.rb:20:in `execute'
    161: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:19:in `execute'
    160: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:24:in `critique'
    159: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `run'
    158: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `each'
     ... 153 levels...
      4: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `each'
      3: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `block in mass'
      2: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `mass'
      1: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `inject'
/usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `each': stack level too deep (SystemStackError)
/app # RUBY_THREAD_MACHINE_STACK_SIZE=500000 rubycritic -f html --no-browser app lib
RubyCritic can provide more feedback if you use a Git, Mercurial or Perforce repository. Churn will not be calculated.
Traceback (most recent call last):
    165: from /usr/local/bundle/bin/rubycritic:23:in `<main>'
    164: from /usr/local/bundle/bin/rubycritic:23:in `load'
    163: from /usr/local/bundle/gems/rubycritic-3.3.0/bin/rubycritic:10:in `<top (required)>'
    162: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/cli/application.rb:20:in `execute'
    161: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:19:in `execute'
    160: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:24:in `critique'
    159: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `run'
    158: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `each'
     ... 153 levels...
      4: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `each'
      3: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `block in mass'
      2: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `mass'
      1: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `inject'
/usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `each': stack level too deep (SystemStackError)
/app # rubycritic -f html --no-browser app lib
RubyCritic can provide more feedback if you use a Git, Mercurial or Perforce repository. Churn will not be calculated.
Traceback (most recent call last):
    164: from /usr/local/bundle/bin/rubycritic:23:in `<main>'
    163: from /usr/local/bundle/bin/rubycritic:23:in `load'
    162: from /usr/local/bundle/gems/rubycritic-3.3.0/bin/rubycritic:10:in `<top (required)>'
    161: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/cli/application.rb:20:in `execute'
    160: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:19:in `execute'
    159: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/commands/default.rb:24:in `critique'
    158: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `run'
    157: from /usr/local/bundle/gems/rubycritic-3.3.0/lib/rubycritic/analysers_runner.rb:27:in `each'
     ... 152 levels...
      4: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `inject'
      3: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `each'
      2: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `block in mass'
      1: from /usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `mass'
/usr/local/bundle/gems/sexp_processor-4.10.0/lib/sexp.rb:223:in `inject': stack level too deep (SystemStackError)

This is some output with the rubycritic gem trying to parse a deeply nested AST.
Below is some output with the brakeman gem trying to parse the same deeply nested AST.

/app # ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
1048576
/app # RUBY_THREAD_MACHINE_STACK_SIZE=100000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
131072
/app # RUBY_THREAD_MACHINE_STACK_SIZE=500000 ruby -ve 'p RubyVM::DEFAULT_PARAMS[:thread_machine_stack_size]'
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-musl]
503808
/app # brakeman -o brakeman-output.json --no-progress --separate-models --no-branching
Loading scanner...
Processing application in /app
Processing gems...
[Notice] Detected Rails 5 application
Processing configuration...
[Notice] Escaping HTML by default
Parsing files...
Processing initializers...
Processing libs...
Traceback (most recent call last):
    254: from /usr/local/bundle/bin/brakeman:23:in `<main>'
    253: from /usr/local/bundle/bin/brakeman:23:in `load'
    252: from /usr/local/bundle/gems/brakeman-4.1.1/bin/brakeman:8:in `<top (required)>'
    251: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:20:in `start'
    250: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:35:in `run'
    249: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:142:in `run_report'
    248: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:118:in `regular_report'
    247: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:133:in `run_brakeman'
     ... 242 levels...
      4: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      3: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      2: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      1: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
/usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash': stack level too deep (SystemStackError)
/app # RUBY_THREAD_MACHINE_STACK_SIZE=100000 brakeman -o brakeman-output.json --no-progress --separate-models --no-branching
Loading scanner...
Processing application in /app
Processing gems...
[Notice] Detected Rails 5 application
Processing configuration...
[Notice] Escaping HTML by default
Parsing files...
Processing initializers...
Processing libs...
Traceback (most recent call last):
    256: from /usr/local/bundle/bin/brakeman:23:in `<main>'
    255: from /usr/local/bundle/bin/brakeman:23:in `load'
    254: from /usr/local/bundle/gems/brakeman-4.1.1/bin/brakeman:8:in `<top (required)>'
    253: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:20:in `start'
    252: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:35:in `run'
    251: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:142:in `run_report'
    250: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:118:in `regular_report'
    249: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:133:in `run_brakeman'
     ... 244 levels...
      4: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      3: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      2: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      1: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
/usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash': stack level too deep (SystemStackError)
/app # RUBY_THREAD_MACHINE_STACK_SIZE=500000 brakeman -o brakeman-output.json --no-progress --separate-models --no-branching
Loading scanner...
Processing application in /app
Processing gems...
[Notice] Detected Rails 5 application
Processing configuration...
[Notice] Escaping HTML by default
Parsing files...
Processing initializers...
Processing libs...
Traceback (most recent call last):
    256: from /usr/local/bundle/bin/brakeman:23:in `<main>'
    255: from /usr/local/bundle/bin/brakeman:23:in `load'
    254: from /usr/local/bundle/gems/brakeman-4.1.1/bin/brakeman:8:in `<top (required)>'
    253: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:20:in `start'
    252: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:35:in `run'
    251: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:142:in `run_report'
    250: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:118:in `regular_report'
    249: from /usr/local/bundle/gems/brakeman-4.1.1/lib/brakeman/commandline.rb:133:in `run_brakeman'
     ... 244 levels...
      4: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      3: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      2: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
      1: from /usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash'
/usr/local/bundle/gems/brakeman-4.1.1/lib/ruby_parser/bm_sexp.rb:106:in `hash': stack level too deep (SystemStackError)

Both brakeman and rubycritic never would have such error when we were using ruby 2.4

#7 [ruby-dev:50452] Updated by wanabe (_ wanabe) 9 months ago

mame (Yusuke Endoh) wrote:

環境変数 RUBY_THREAD_MACHINE_STACK_SIZE に 1048576 とか大きい値を設定してみたら動きますか?
これで pthread_attr_setstacksize に渡すスタックサイズを調整できます。

スレッドといってもメインスレッドしか存在せず、native_thread_create() が呼び出されるわけではないので、残念ながら動作は変化していないようです。

/work # ./miniruby test.rb 2>&1|grep level
test.rb:8:in `each_value': stack level too deep (SystemStackError)
     ... 194 levels...
/work # RUBY_THREAD_MACHINE_STACK_SIZE=1048576 ./miniruby test.rb 2>&1|grep level
test.rb:8:in `each_value': stack level too deep (SystemStackError)
     ... 188 levels...

スレッドで包んでやると期待通りの挙動が得られました。

/work # RUBY_THREAD_MACHINE_STACK_SIZE=1048576 ./miniruby -e 'Thread.new{ load "test.rb" }.join' 2>&1|grep level
test.rb:8:in `each_value': stack level too deep (SystemStackError)
     ... 1720 levels...
test.rb:8:in `each_value': stack level too deep (SystemStackError)
     ... 1720 levels...

以下のようなテストプログラムを動かすと、Alpine musl 環境では rlim_cur の値と pthread_attr_getstacksize の値が大きく異なることがわかりました。
このあたりが関係しているのではないでしょうか。

#include <stdio.h>
#include <sys/resource.h>
#include <pthread.h>

int main (void) {
  struct rlimit limit;
  pthread_attr_t attr;
  size_t stacksize;

  getrlimit(RLIMIT_STACK, &limit);
  pthread_attr_init(&attr);
  pthread_attr_getstacksize(&attr, &stacksize);
  pthread_attr_destroy(&attr);
  printf("rlim_cur: %ld\nrlim_max: %ld\nstacksize: %ld\n", (size_t)limit.rlim_cur, (size_t)limit.rlim_max, stacksize);
  return 0;
}
/work # gcc -pthread test.c && ./a.out
rlim_cur: 8388608
rlim_max: -1
stacksize: 81920

なお、同じプログラムを ubuntu 18.04 上で動かした場合には両者は一致しました。

$ gcc -pthread test.c && ./a.out
rlim_cur: 8388608
rlim_max: -1
stacksize: 8388608

#8 [ruby-dev:50463] Updated by jhealy (James Healy) 9 months ago

There's a very similar sounding issue being discussed on the python bug tracker: https://bugs.python.org/issue32307

#9 [ruby-dev:50494] Updated by ncopa (Natanael Copa) 8 months ago

jhealy (James Healy) wrote:

There's a very similar sounding issue being discussed on the python bug tracker: https://bugs.python.org/issue32307

I first thought that this is same/similar issue as the python, but it is not. (I think ruby may have same problem as python, but that is a separate issue)

What happens here is that ruby tries to calculate the stack size using pthread_getattr_np(pthread_self()). This does not work when current thread (pthread_self()) is main thread. Doing stack size calculation for main thread is non-trivial, if possible at all. musl developers has chosen the safe approach here and reports only the amount of stack size that kernel can guarantee.

This testcase illustrates the problem:

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>

int main()
{
        pthread_attr_t attr;
        size_t size;

        if (pthread_getattr_np(pthread_self(), &attr) == -1)
                return 1;

        if (pthread_attr_getstacksize(&attr, &size) == -1)
                return 1;
        pthread_attr_destroy(&attr);
        printf("stacksize pthread_self(): %zu\n", size);

        return 0;
}

When running it on alpine it returns:

stacksize pthread_self(): 126976

Which is way lower than the 8MB set by ulimit.

#10 [ruby-dev:50497] Updated by ncopa (Natanael Copa) 7 months ago

Possible fix or workaround:

diff --git a/thread_pthread.c b/thread_pthread.c
index 951885ffa0..e2d662143b 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -721,7 +721,7 @@ ruby_init_stack(volatile VALUE *addr
         native_main_thread.register_stack_start = (VALUE*)bsp;
     }
 #endif
-#if MAINSTACKADDR_AVAILABLE
+#if MAINSTACKADDR_AVAILABLE && !(defined(__linux__) && !defined(__GLIBC__))
     if (native_main_thread.stack_maxsize) return;
     {
        void* stackaddr;
@@ -1680,7 +1680,7 @@ ruby_stack_overflowed_p(const rb_thread_t *th, const void *addr)

 #ifdef STACKADDR_AVAILABLE
     if (get_stack(&base, &size) == 0) {
-# ifdef __APPLE__
+# if defined(__APPLE__) || (defined(__linux__) && !defined(__GLIBC__))
        if (pthread_equal(th->thread_id, native_main_thread.id)) {
            struct rlimit rlim;
            if (getrlimit(RLIMIT_STACK, &rlim) == 0 && rlim.rlim_cur > size) {

I wonder why that second hunk which fall backs to getrlimit(RLIMIT_STACK) is needed when current thread is main thread on __APPLE__. It looks like the exact issue we have with musl so I would not be surprised if the same issue exists on apple too.

#11 Updated by usa (Usaku NAKAMURA) 7 months ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED

#12 [ruby-dev:50499] Updated by ncopa (Natanael Copa) 7 months ago

this is somewhat related to this comment: https://bugs.ruby-lang.org/issues/9454#note-6

There is a problem with pthread_getattr_np() when running main thread since we get different results depending on implementation. The _np means non-portable so we cannot assume any specific behavior from the implementation.

The proper fix would be to test for known platform (eg __GLIBC__) before relying on pthread_getattr_np and then fall back to a portable logic for unknown platforms.

The most reliable way to calculate the stack size for main thread for Linux is to parse /proc/self/maps and this is what glibc pthread_getattr_np implementation does.

#14 Updated by wanabe (_ wanabe) 7 months ago

  • Related to Bug #9454: The define_method(:class) segfault added

#15 [ruby-dev:50517] Updated by jottr (jottr -) 7 months ago

I'd like to inquire on the progress of this issue. What is missing to get this resolved?

It is affecting downstream users who deploy their ruby applications by means of alpine linux containers. The alpine linux ruby containers are currently unusable because of this.

#16 [ruby-dev:50519] Updated by naruse (Yui NARUSE) 7 months ago

jottr (jottr -) wrote:

I'd like to inquire on the progress of this issue. What is missing to get this resolved?

It is affecting downstream users who deploy their ruby applications by means of alpine linux containers. The alpine linux ruby containers are currently unusable because of this.

get_stack and get_main_stack needs platform dependent implementation.
In this context "platform" means libc (or libthread or something on some OSes).
Therefore ncopa's patch looks good for me in general.

The blocker seems

  • ncopa's patch breaks non linux environment because the patch removes define get_main_stack but not defined for other thant Linux non glibc
  • minor one: it should add comment about !defined(GLIBC) intends musl libc and uClibc defines GLIBC.

#17 [ruby-dev:50549] Updated by jnardone (joe nardone) 5 months ago

it's frustrating that this is still open after four months. alpine-ruby-2.5 is borderline unusable with this still in Ruby. do we know if ncopa (Natanael Copa) is still working on this issue or not?

#18 [ruby-dev:50550] Updated by wanabe (_ wanabe) 5 months ago

It seems to be reasonable not to rely pthread_getattr_np() on defined(__linux__) && !defined(__GLIBC__) environment because the function has suffix "_np".
I guess it would be ideal if ruby relies pthread_getattr_np() on only tested environments like as defined(__linux__) && defined(__GLIBC__), but it is too much pain to follow / test other environments.
ncopa's 2nd patch has a side effect as commented.
So I think 1st patch is more pragmatic.

Naruse-san (or other?):
How about https://bugs.ruby-lang.org/issues/14387#note-10 patch?

#19 [ruby-dev:50556] Updated by naruse (Yui NARUSE) 5 months ago

jnardone (joe nardone) wrote:

it's frustrating that this is still open after four months. alpine-ruby-2.5 is borderline unusable with this still in Ruby. do we know if ncopa (Natanael Copa) is still working on this issue or not?

I wrote 0001-thread_pthread.c-make-get_main_stack-portable-on-lin.patch is not acceptable in .

wanabe (_ wanabe) wrote:

It seems to be reasonable not to rely pthread_getattr_np() on defined(__linux__) && !defined(__GLIBC__) environment because the function has suffix "_np".
I guess it would be ideal if ruby relies pthread_getattr_np() on only tested environments like as defined(__linux__) && defined(__GLIBC__), but it is too much pain to follow / test other environments.

Use pthread_getattr_np() only on musl libc environment.
This stack related code is highly platform dependent.
You can use non portable functions for non portable codes.

You can see many ifdefs in get_stack.
Don't try to write portable code without ifdef in this area; "portable" means write ifdefs for all tested environments.

ncopa's 2nd patch has a side effect as commented.
So I think 1st patch is more pragmatic.

Naruse-san (or other?):
How about https://bugs.ruby-lang.org/issues/14387#note-10 patch?

It's acceptable because it doesn't break other tested environment, but getrlimit really works on musl libc environment?

#20 [ruby-dev:50564] Updated by wanabe (_ wanabe) 4 months ago

naruse (Yui NARUSE) wrote:

It's acceptable because it doesn't break other tested environment, but getrlimit really works on musl libc environment?

Thank you for your comment.

Okay, The patch needs one or more proofs of its behaviour, like that:

  • Original issue has gone away.
  • Standard test codes run well.
    • test-all
    • ruby/spec
  • getrlimit works on some situations like:
    • on single thread
    • with multiple threads
    • with RLIMIT_STACK environment variable
  • getrlimit code of musl is implemented correctly as expected.
    • (But It's doubtful whether it can be. I guess that a proof of code soundness is very difficult.)
  • Some "real world" applications can work.
    • I think it is better example that that application(s) can't work without the patch.

I can't prove because I am not musl / Alpine guy.
Anyone can help?

#21 [ruby-dev:50565] Updated by ncopa (Natanael Copa) 4 months ago

naruse (Yui NARUSE) wrote:

jottr (jottr -) wrote:

I'd like to inquire on the progress of this issue. What is missing to get this resolved?

It is affecting downstream users who deploy their ruby applications by means of alpine linux containers. The alpine linux ruby containers are currently unusable because of this.

get_stack and get_main_stack needs platform dependent implementation.
In this context "platform" means libc (or libthread or something on some OSes).
Therefore ncopa's patch looks good for me in general.

The blocker seems

  • ncopa's patch breaks non linux environment because the patch removes define get_main_stack but not defined for other thant Linux non glibc

No. get_main_stack gets defined in the #else block for everything that is not Linux or not glibc:

#if defined(__linux__) && !defined(__GLIBC__) && defined(HAVE_GETRLIMIT)
 ...
#else
# define get_main_stack(addr, size) get_stack(addr, size)
#endif

I can turn it around the other for better readability if you want:

#if !defined(__linux__) || defined(__GLIBC__) || !defined(HAVE_GETRLIMIT)
# define get_main_stack(addr, size) get_stack(addr, size)
#else
# ...
#endif
  • minor one: it should add comment about !defined(GLIBC) intends musl libc and uClibc defines GLIBC.

Will add comment about musl libc. I think uClibc behavior is irrelevant here.

#22 [ruby-dev:50566] Updated by ncopa (Natanael Copa) 4 months ago

wanabe (_ wanabe) wrote:

It's acceptable because it doesn't break other tested environment, but getrlimit really works on musl libc environment?

getrlimit works on musl libc, as defined by POSIX. This is a syscall so there is no reason why it should not work.

#23 [ruby-dev:50567] Updated by ncopa (Natanael Copa) 4 months ago

wanabe (_ wanabe) wrote:

It seems to be reasonable not to rely pthread_getattr_np() on defined(__linux__) && !defined(__GLIBC__) environment because the function has suffix "_np".
I guess it would be ideal if ruby relies pthread_getattr_np() on only tested environments like as defined(__linux__) && defined(__GLIBC__), but it is too much pain to follow / test other environments.

I agree that defined(__linux__) && defined(__GLIBC__) is better. Note that there are some non-linux glibc variants too but I would expect the pthread_getattr_np() call work as documented in the GNU libc manual, regarless if kernel is linux, hurd or anything else, so it should be enough with defined(__GLIBC__).

ncopa's 2nd patch has a side effect as commented.
So I think 1st patch is more pragmatic.

Naruse-san (or other?):
How about https://bugs.ruby-lang.org/issues/14387#note-10 patch?

That patch is a workaround only and not a fix. It will cause the code to skip the reserve_stack call so behavior will differ from Linux systems with glibc. The second patch fixes the issue properly, even if it can be disputed if the reserve_stack is a good idea in the first place.

Also available in: Atom PDF