Project

General

Profile

Bug #1993

IO.select fails when called in multiple threads on 1.8.7p174

Added by dazuma (Daniel Azuma) almost 10 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
ruby -v:
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9.8.0]
[ruby-core:25114]

Description

=begin
IO#select (Kernel#select) fails when run on different sets of IO objects in different threads. This affects release versions 1.8.7p160, 1.8.7p173, and 1.8.7p174. It does NOT seem to affect recent versions of 1.9.1 that I have tested. It also does NOT affect release version 1.8.7p72. I have not tested 1.8.6 versions. The repro steps have been tested mostly on Mac OS X 10.5.8 on an Intel-based MacBook Pro. I have, however, seen similar behavior on a recent Fedora Linux i686.

To reproduce, run the following script. (Replace the two filenames with distinct known readable files on your system.)

# Begin code

FILENAME1 = "Rakefile"
FILENAME2 = "README"
TWO_THREADS = true

f1 = File.open(FILENAME2)
f2 = File.open(FILENAME1)
t1 = Thread.new do
c1 = 0
loop do
c1 += 1
s1 = IO.select([f1], nil, nil, 0)
n1 = s1 ? s1.first.size : 0
puts "t1: num=#{n1} iter=#{c1}"
end
end
t2 = Thread.new do
c2 = 0
loop do
c2 += 1
s2 = IO.select([f2], nil, nil, 0)
n2 = s2 ? s2.first.size : 0
puts "t2: num=#{n2} iter=#{c2}"
end
end if TWO_THREADS
t1.join

# End code

The code simply repeatedly calls IO#select on IO objects known to have readable bytes, either in one thread or two threads. When run on one thread (TWO_THREADS=false), it behaves as expected, printing "num=1" indicating that select has detected the readable stream. However, when run on two threads (TWO_THREADS=true), both threads print "num=0" indicating neither thread is detecting readable information on their streams.

The relevant code appears to be the function rb_thread_schedule() in eval.c, and I believe this issue is related to revision 21165. I haven't been able to untangle everything in this code yet, but here's what I've been able to determine:

  • The code that collects file descriptors for the system select() call (lines 11063-11073 of the 1.8.7 branch as of revision 24104) DOES NOT RUN for a given thread unless the thread has a THREAD_STOPPED status at that time (because of line 11051). Therefore, any threads with a THREAD_RUNNABLE status at that time, are effectively shut out of receiving select() results unless their fd lists overlap other threads.

  • It appears that the tendency is (given the sample code above) for the next qualifying thread (that is, the thread that will be assigned to the "next" variable later on), to be in the THREAD_RUNNABLE state at this time. Since such threads are shut out of the select() call, they can never be assigned to "th_found" (see lines 11208-11212). As a result, "th_found" is assigned to a later thread in the list, rather than, as appears to be the intent, the first qualifying thread in the list (note the break on line 11214).

  • Unfortunately, this mismatches lines 11230ff. Those lines, which choose the "next" thread, always prefer the first thread given equal priority (line 11231). Since "th_found" tends not to be the first qualifying thread, we have a situation where lines 11231 and 11232 are never both true; as a result, th->select_value is never set, and the select calls never succeed.

  • The code appeared to work pre-revision-21165 (e.g. 1.8.7p72) because that version of the code set select_value on every qualifying thread, whereas the current code sets it on only one thread.

Here's where I'm unsure about how to proceed with a patch. I would like to move lines 11058 through 11073 to immediately above line 11051. This would add each thread's file descriptors to the select call, regardless of whether the thread has status THREAD_STOPPED or THREAD_RUNNABLE. This change appears to fix the test case above. And I believe it is the correct behavior; however, I'm new to this part of the code and do not have enough understanding of the intent of thread->status to assert that this is correct. I was hoping someone with more knowledge of this area could use this analysis as a starting point.
=end


Related issues

Related to Backport186 - Backport #2039: Backport 24413, 24416, 24442 to fix IO#select threading issueClosed09/04/2009Actions
Is duplicate of Ruby 1.8 - Bug #1484: Ruby 1.8.6_p368 and Ruby 1.8.7_p160 have threading regressionsOpen05/18/2009Actions

History

#1

Updated by dazuma (Daniel Azuma) almost 10 years ago

=begin
One other note-- this CAN be difficult to reproduce, because you have to catch both threads with an IO#select scheduled at the same time. I find the repro code above pretty consistent on my setup, which is Ruby 1.8.7p174, Mac OS 10.5.8, on a MacBook Pro, 2.5 GHz core 2 duo. But as with most threading-related issues, YMMV.

Some related links that have been pointed out to me in since last night:

#2

Updated by akr (Akira Tanaka) almost 10 years ago

  • Assignee set to shyouhei (Shyouhei Urabe)

=begin
backport r24413, r24416, r24442.
=end

#3

Updated by dazuma (Daniel Azuma) almost 10 years ago

=begin
I ran my tests against r24647 of the ruby_1_8 branch, and it looks like the problem is solved there. Thanks! Looking forward to seeing a 1.8.7 patch.
=end

#4

Updated by shyouhei (Shyouhei Urabe) almost 10 years ago

  • Status changed from Open to Closed

=begin
Applied in changeset r24783.
=end

Also available in: Atom PDF