Bug #6825

forking and pthread_cond_timedwait: Invalid argument (EINVAL) on OS X / 1.9.3-p194

Added by Mark A over 1 year ago. Updated over 1 year ago.

[ruby-core:<unknown>]
Status:Assigned
Priority:Normal
Assignee:Benoit Daloze
Category:-
Target version:-
ruby -v: Backport:

Description

https://gist.github.com/47e48301aea114e7b1d3 here is the gist with required setup to reproduce bug. Also crash log and stdout.

It seems that forking is essential for this setup to crash. Also, if you use database connection in some way prior to forking, it might not crash (however, with more complex code it still does).

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]
OS X 10.6.8

hostinfo output:

Mach kernel version:
     Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386
Kernel configured for up to 4 processors.
2 processors are physically available.
4 processors are logically available.
Processor type: i486 (Intel 80486)
Processors active: 0 1 2 3
Primary memory available: 8.00 gigabytes
Default processor set: 88 tasks, 627 threads, 4 processors
Load average: 0.55, Mach factor: 3.43

compiled with gcc version:

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

Associated revisions

Revision 37474
Added by Motohiro KOSAKI over 1 year ago

  • threadpthread.c (nativethreadinit, nativethreaddestroy): removed HAVEPTHREADCONDATTRINIT check because this silly #ifdef makes use-uninitialized-var issue and (2) nativecondinitialize() already have a right platform and caller don't need any additional care. [Bug #6825]

History

#1 Updated by Mark A over 1 year ago

Also notably ruby 1.8.7 does not crash.

#2 Updated by Mark A over 1 year ago

Ubuntu 11.04 doesn't crash with 1.9 ruby either.

#3 Updated by Mark A over 1 year ago

Ruby 1.9.2 doesn't crash either.

#4 Updated by Mark A over 1 year ago

ruby 2.0.0dev (2012-08-02 trunk 36596) [x86_64-darwin10.8.0] also does not crash.

#5 Updated by Mark A over 1 year ago

OS X 10.8 seems to be unaffected.

#6 Updated by Mark A over 1 year ago

Confirmed with another 10.6 / 1.9.3-p194.

#7 Updated by Mark A over 1 year ago

After some more fiddling it looks like it's mysql2 problem, not ruby's. Will duplicate issue there.

#8 Updated by Eric Hodel over 1 year ago

  • Status changed from Open to Feedback

#9 Updated by Mark A over 1 year ago

Update: I greatly simplified my test-case.

https://gist.github.com/47e48301aea114e7b1d3

If I remove sleeping threads on line 31 or require 'active_record' on line 1, bug stops reproducing.

Returning back to ruby-lang, as there is no mysql2 there anymore. Crash is still with the same error.

Ideas, suggestions?

#10 Updated by Mark A over 1 year ago

Oh, it also reproduced on ruby 2.0.0.dev from current git, so I guess it is still not fixed.

#11 Updated by Mark A over 1 year ago

Another update: I opened up a file active_record.rb inside installed activerecord gem and completely commented it out (so that even ActiveRecord is not defined after require 'active_record'). Still crashes. I guess that takes care of gems and everything, so the problem should be between ruby, rubygems and standard library.

Line require 'active_record' is still required for whole setup to crash for some reason.

Hope that gives you some idea.

#12 Updated by Eric Hodel over 1 year ago

=begin
I can't reproduce on OS X 10.8 ruby 2.0.0dev (2012-08-03 trunk 36602) [x86_64-darwin12.0.0]

I modified your script to remove require 'active_record' and altered the main thread to sleep forever. This ensures that mysql and other C extensions are not loaded. It ran for over two minutes without problems.

Can you reproduce this with require 'mysql' and not active_record?

Can you show the console output with your modified active_record.rb (the loaded features section is of particular interest).

Here is what I used:

require 'net/http'

Thread.abortonexception = true
class Worker
def initialize
@tasks = []
work
end

def work
  Thread.new do
    loop do
      task = nil
      task = @tasks.shift if @tasks.length > 0
      task.call if task
      sleep(0.25)
    end
  end
end

def schedule(&block)
  @tasks << block
end

end

pid = fork do
class TestLoop
def initialize
@worker = Worker.new
(1..10).map { Thread.new { loop { sleep(0.5) } } }
end

  def run
    loop do
      @worker.schedule { puts Net::HTTP.get("github.com", "/").length }
      sleep(0.25)
    end
  end
end

TestLoop.new.run

end

sleep

(I don't have mysql installed to check.)

=end

#13 Updated by Mark A over 1 year ago

I can't reproduce on OS X 10.8 ruby 2.0.0dev (2012-08-03 trunk 36602) [x86_64-darwin12.0.0]

That seems right, 10.8 seems to be unaffected.

Can you show the console output with your modified active_record.rb (the loaded features section is of particular interest).

1.9.3p194 :002 > require 'active_record'
 => true 
1.9.3p194 :003 > $LOADED_FEATURES.each(&method(:puts))
enumerator.so
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/enc/encdb.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/enc/trans/transdb.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/rbconfig.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/deprecate.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/e2mmap.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/init.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/workspace.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/inspector.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/context.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/extend-command.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/output-method.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/notifier.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/slex.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/ruby-token.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/ruby-lex.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/src_encoding.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/magic-file.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-darwin10.8.0/readline.bundle
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/input-method.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/locale.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/version.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/path_support.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/irb/completion.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/prettyprint.rb
/Users/mark/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/pp.rb
/Users/mark/.rvm/scripts/irbrc.rb
/Users/mark/.rvm/gems/ruby-1.9.3-p194@ruby-pthread-bug/gems/activerecord-3.2.7/lib/active_record.rb

If rubygems don't get hit (as in add gem directory to $LOAD_PATH manually), bug doesn't seem to trigger. I am not sure, whether rubygems themselves are the reason or something non-trivial needs to happen for it to show itself.

#14 Updated by Mark A over 1 year ago

Switching require 'active_record' with require 'mysql2' still crashes the interpreter as long as required file is taken using gems mechanism.

#15 Updated by Eric Hodel over 1 year ago

  • Status changed from Feedback to Open

#16 Updated by Motohiro KOSAKI over 1 year ago

At least, require 'mysql2' version nor drbrain version don't crash on my Mountain Lion environment.

#17 Updated by Mark A over 1 year ago

Yeah, that seems to be restricted to snow leopard.

#18 Updated by Yusuke Endoh over 1 year ago

  • Status changed from Open to Assigned
  • Assignee set to Motohiro KOSAKI

Kosaki-san, do you have any idea to addres this issue?

Yusuke Endoh mame@tsg.ne.jp

#19 Updated by Motohiro KOSAKI over 1 year ago

  • Assignee changed from Motohiro KOSAKI to Kenta Murata

Kosaki-san, do you have any idea to addres this issue?

I have no idea. unfortunately snow leopard is too old and i have no chance to get it.
@mrkn, do you have any chance to see this issue?

#20 Updated by Kenta Murata over 1 year ago

I don't have snow-leopard environment, so I cannot investigate this issue.

#21 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Assigned to Closed

I don't have snow-leopard environment, so I cannot investigate this issue.

OK. Thank you.
This looks like old OS X bug but we have no way to dig. give up. I'd like to close this feature as won't fix.
Anyway snow leopard is no longer supported.

To Mark, please reopen when you find exact reason and fixing way. we are very sorry for inconvenience.

#22 Updated by Benoit Daloze over 1 year ago

mrkn (Kenta Murata) wrote:

I don't have snow-leopard environment, so I cannot investigate this issue.

I do still have a snow-leopard environment and I can reproduce with
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin10.8.0]
and trunk (r37359). I'll try to investigate if I find time.

#23 Updated by Benoit Daloze over 1 year ago

  • Status changed from Closed to Assigned
  • Assignee changed from Kenta Murata to Benoit Daloze

This is also true with latest trunk (r37462).

I'm assigning to myself.

bug6825.rb:31: [BUG] pthreadcondtimedwait: Invalid argument (EINVAL)
ruby 2.0.0dev (2012-11-04 trunk 37462) [x86_64-darwin10.8.0]

-- Control frame information -----------------------------------------------
c:0005 p:---- s:0013 e:000012 CFUNC :sleep
c:0004 p:0007 s:0009 e:000008 BLOCK bug6825.rb:31 [FINISH]
c:0003 p:---- s:0007 e:000006 CFUNC :loop
c:0002 p:0005 s:0004 e:000003 BLOCK bug6825.rb:31 [FINISH]
c:0001 p:---- s:0002 e:000001 TOP [FINISH]

bug6825.rb:31:in block (2 levels) in initialize'
bug6825.rb:31:in
loop'
bug6825.rb:31:in block (3 levels) in initialize'
bug6825.rb:31:in
sleep'

#24 Updated by Benoit Daloze over 1 year ago

I poked around an produced a core dump (the bug would not reproduce under gdb with a breakpoint set).

Arguments to pthreadcondtimedwait() seem valid, in particular the timespec is about 500ms in the future.
Other calls to pthreadcondtimedwait() always return ETIMEDOUT or 0.

I saw rbthreadt::nativethreaddata.sleepcond was weirdly initialized.
It is not initialized in native
threadinit() if HAVEPTHREADCONDATTRINIT is undefined.
And it is used in any case in ubfpthreadcondsignal().
Maybe checks for HAVE
PTHREADCONDATTRINIT should not be done in nativethreadinit() and nativethreaddestroy() since these functions already do the right checks?
This should be unrelated though, since OS X has pthreadcondattrinit().

It might be related to GVL release by multiple threads but I have no clue.
It does not seem related directly to the parallel DNS resolution, since some traces have only threads in nativecondtimedwait().
And from the "only reproducible on snow-leopard" argument, it seems snow-leopard pthread's bug.

@kosaki @mrkn Would it be useful if I could provide you the core dump and other info?

#25 Updated by Motohiro KOSAKI over 1 year ago

@kosaki @mrkn Would it be useful if I could provide you the core dump and other info?

Thanks!
r35672 seems broke this area and I'll fix it soon. However there is no r35672 in 1.9.3 branch and 1.9.3 seems correct. hmm...
Could you please try 1.9.3 branch too?

#26 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37474.
Mark, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • threadpthread.c (nativethreadinit, nativethreaddestroy): removed HAVEPTHREADCONDATTRINIT check because this silly #ifdef makes use-uninitialized-var issue and (2) nativecondinitialize() already have a right platform and caller don't need any additional care. [Bug #6825]

#27 Updated by Motohiro KOSAKI over 1 year ago

  • Status changed from Closed to Assigned

Reopened. because r37474 only fixed trunk.

#28 Updated by Usaku NAKAMURA over 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport93
  • Assignee changed from Benoit Daloze to Usaku NAKAMURA

Kosaki-san, you can move a ticket to Backport because you are a committer.
So, pelase do so instead of only changing the status to Open.

#29 Updated by Benoit Daloze over 1 year ago

kosaki (Motohiro KOSAKI) wrote:

@kosaki @mrkn Would it be useful if I could provide you the core dump and other info?

Thanks!
r35672 seems broke this area and I'll fix it soon. However there is no r35672 in 1.9.3 branch and 1.9.3 seems correct. hmm...
Could you please try 1.9.3 branch too?

Unfortunately, r37474 does not seem to solve the problem (but it was definitely a potential problem).
This is expected because snow leopard has pthreadcondattrinit().
So I don't know the reason for the bug.

#30 Updated by Motohiro KOSAKI over 1 year ago

  • Tracker changed from Backport to Bug
  • Project changed from Backport93 to ruby-trunk
  • Assignee changed from Usaku NAKAMURA to Benoit Daloze

Hi Eregon,

Oops, I'm sorry. Perhaps I'm still overlooking anything else. Can you please share me your config.h and core file and build revision number?
I'm willing to look core file myself.

Also available in: Atom PDF