Bug #4558

TestSocket#test_closed_read fails after r31230

Added by Tomoyuki Chikanaga about 3 years ago. Updated almost 3 years ago.

[ruby-core:35631]
Status:Closed
Priority:Normal
Assignee:Motohiro KOSAKI
Category:test
Target version:1.9.3
ruby -v:- Backport:

Description

=begin
After r31230, make test-all reports a failure in test_socket.rb @Mac OS X 10.6.6

% make test-all TESTS=../ruby/test/socket/testsocket.rb
./miniruby -I../ruby/lib -I. -I.ext/common ../ruby/tool/runruby.rb --extout=.ext -- "../ruby/test/runner.rb" --ruby="./miniruby -I../ruby/lib -I. -I.ext/common ../ruby/tool/runruby.rb --extout=.ext --" ../ruby/test/socket/test
socket.rb
Run options: "--ruby=./miniruby -I../ruby/lib -I. -I.ext/common ../ruby/tool/runruby.rb --extout=.ext --"

# Running tests:

....F.................

Finished tests in 1.611761s, 13.6497 tests/s, 40.9490 assertions/s.

1) Failure:
testclosedread(TestSocket) [/ruby/test/socket/testsocket.rb:428]:

[IOError] exception expected, not
Class: Errno::EBADF
Message: <"Bad file descriptor">
---Backtrace---
/ruby/test/socket/test
socket.rb:422:in readline'
/ruby/test/socket/test_socket.rb:422:in
block in testclosedread'


22 tests, 66 assertions, 1 failures, 0 errors, 0 skips
make: *** [yes-test-all] Error 1

=end


Related issues

Related to ruby-trunk - Bug #4527: [PATCH] IO#close releases GVL if possible Rejected 03/26/2011
Related to ruby-trunk - Feature #4570: [PATCH v2] io.c (rb_io_close): release GVL if possible Closed 04/12/2011
Related to ruby-trunk - Bug #4390: TCPSocket#readline doesn't raise if the socket is #close'... Closed 02/12/2011

History

#1 Updated by Motohiro KOSAKI about 3 years ago

  • ruby -v changed from ruby 1.9.3dev (2011-04-05 trunk 31241) [x86_64-darwin10.6.0] to -

=begin


Bug #4558: TestSocket#testclosedread fails after r31230
http://redmine.ruby-lang.org/issues/4558

I think current rbioclose() is broken. We have to call rbthreadfd_close()
before releasing GVL.

Eric, Am I missing something?
=end

#2 Updated by Eric Wong about 3 years ago

=begin
KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:


Bug #4558: TestSocket#testclosedread fails after r31230
http://redmine.ruby-lang.org/issues/4558

I think current rbioclose() is broken. We have to call rbthreadfd_close()
before releasing GVL.

Eric, Am I missing something?

You are correct.

I can't reproduce the test failure on x86_64-linux but the
following patch should fix a race condition:

diff --git a/io.c b/io.c
index 7ce7148..b79cc5e 100644
--- a/io.c
+++ b/io.c
@@ -3685,8 +3685,8 @@ rbioclose(VALUE io)
if (fptr->fd < 0) return Qnil;

  fd = fptr->fd;
  • rbiofptrcleanup(fptr, FALSE); rbthreadfdclose(fd);
  • rbiofptr_cleanup(fptr, FALSE);

    if (fptr->pid) {
    rb_syswait(fptr->pid);

    Eric Wong
    =end

#3 Updated by Motohiro KOSAKI about 3 years ago

=begin
2011/4/8 Eric Wong normalperson@yhbt.net:

KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:


Bug #4558: TestSocket#testclosedread fails after r31230
http://redmine.ruby-lang.org/issues/4558

I think current rbioclose() is broken. We have to call rbthreadfd_close()
before releasing GVL.

Eric, Am I missing something?

You are correct.

I can't reproduce the test failure on x86_64-linux but the
following patch should fix a race condition:

diff --git a/io.c b/io.c
index 7ce7148..b79cc5e 100644
--- a/io.c
+++ b/io.c
@@ -3685,8 +3685,8 @@ rbioclose(VALUE io)
    if (fptr->fd < 0) return Qnil;

    fd
=end

#4 Updated by Tomoyuki Chikanaga about 3 years ago

  • Target version set to 1.9.3
  • ruby -v changed from - to ruby 1.9.3dev (2011-04-05 trunk 31241) [x86_64-darwin10.6.0]

=begin
Hi,

I applied Eric's patch, but TestSocket#testclosedread still report same failure.

I can reproduce EBADF with following script.

r, w = IO.pipe
readthread = Thread.new{ r.read(1) }
sleep(0.1) until read
thread.stop?
r.close
read_thread.join

=end

#5 Updated by Eric Wong about 3 years ago

=begin
Tomoyuki Chikanaga redmine@ruby-lang.org wrote:

ruby -v changed from - to ruby 1.9.3dev (2011-04-05 trunk 31241) [x86_64-darwin10.6.0]

Hi,

I applied Eric's patch, but TestSocket#testclosedread still report same failure.

I can reproduce EBADF with following script.

Thanks for testing, think I have a better fix below (supercedes my
original fix)

Also pushed to the "io-close-fixes" branch of git://bogomips.org/ruby.git

From d5f9659ea9c2e8e0ed67544ed35afef4ca2bb3c5 Mon Sep 17 00:00:00 2001
From: Eric Wong normalperson@yhbt.net
Date: Thu, 7 Apr 2011 19:25:20 +0000
Subject: [PATCH] io.c (rbioclose): ensure IOError for cross-thread closes

We need to inform threads to stop operations on the FD before
closing it and also invalidate the fd member of the rbiot
struct for other threads to properly raise IOError.

FDs may be created and destroyed without the GVL, so
rbthreadfd_close() may be improperly hitting the wrong
threads/FDs if we close() before notifying and in the worst case
or threads will end up reading/writing to an unexpected FD.

ref:
ref: http://redmine.ruby-lang.org/issues/4558


io.c | 25 ++++++++++++++++++-------
test/ruby/test_io.rb | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/io.c b/io.c
index 7ce7148..5d37b7f 100644
--- a/io.c
+++ b/io.c
@@ -3504,6 +3504,7 @@ maygvl_close(int fd, int keepgvl)
if (keepgvl)
return close(fd);

  • rbthreadfd_close(fd);
    /*

    • close() may block for certain file types (NFS, SO_LINGER sockets,
    • inotify), so let other threads run. @@ -3525,6 +3526,8 @@ maygvl_fclose(FILE *file, int keepgvl) if (keepgvl) return fclose(file);
  • rbthreadfdclose(fileno(file));
    +
    return (int)rb
    threadblockingregion(nogvlfclose, file, RUBYUBF_IO, 0);
    }

    @@ -3555,24 +3558,35 @@ fptrfinalize(rbiot *fptr, int noraise)
    }
    }
    if (IS
    PREP_STDIO(fptr) || fptr->fd <= 2) {

  •    int fd = fptr->fd;
    

    +

  •    fptr->stdio_file = 0;
    
  •    fptr->fd = -1;
    
  •    rb_thread_fd_close(fd);
      goto skip_fd_close;
    

    }
    if (fptr->stdio_file) {

  • FILE *stdiofile = fptr->stdiofile;
    +

  • fptr->stdio_file = 0;

  • fptr->fd = -1;
    +
    /* fptr->stdio_file is deallocated anyway
    * even if fclose failed. */

  • if ((maygvlfclose(fptr->stdiofile, noraise) < 0) && NIL_P(err))

  • if ((maygvlfclose(stdiofile, noraise) < 0) && NIL_P(err))
    err = noraise ? Qtrue : INT2NUM(errno);
    }
    else if (0 <= fptr->fd) {

  • int fd = fptr->fd;

  • fptr->fd = -1;
    +
    /* fptr->fd may be closed even if close fails.
    * POSIX doesn't specify it.
    * We assumes it is closed. */

  • if ((maygvlclose(fptr->fd, noraise) < 0) && NILP(err))

  • if ((maygvlclose(fd, noraise) < 0) && NILP(err))
    err = noraise ? Qtrue : INT2NUM(errno);
    }
    skipfdclose:

  • fptr->fd = -1;

  • fptr->stdiofile = 0;
    fptr->mode &= ~(FMODE
    READABLE|FMODE_WRITABLE);

    if (!NILP(err) && !noraise) {
    @@ -3668,7 +3682,6 @@ VALUE
    rb
    ioclose(VALUE io)
    {
    rb
    io_t *fptr;

  • int fd;
    VALUE writeio;
    rb
    iot *writefptr;

    @@ -3684,9 +3697,7 @@ rbioclose(VALUE io)
    if (!fptr) return Qnil;
    if (fptr->fd < 0) return Qnil;

  • fd = fptr->fd;
    rbiofptr_cleanup(fptr, FALSE);

  • rbthreadfd_close(fd);

    if (fptr->pid) {
    rbsyswait(fptr->pid);
    diff --git a/test/ruby/test
    io.rb b/test/ruby/testio.rb
    index 6b8e6b5..3d086b3 100644
    --- a/test/ruby/test
    io.rb
    +++ b/test/ruby/test_io.rb
    @@ -1809,4 +1809,43 @@ End
    Process.waitpid2(pid)
    end
    end
    +

  • def testcrossthreadclosefd

  • with_pipe do |r,w|

  •  read_thread = Thread.new do
    
  •    begin
    
  •      r.read(1)
    
  •    rescue => e
    
  •      e
    
  •    end
    
  •  end
    

    +

  •  sleep(0.1) until read_thread.stop?
    
  •  r.close
    
  •  read_thread.join
    
  •  assert_kind_of(IOError, read_thread.value)
    
  • end

  • end
    +

  • def testcrossthreadclosestdio

  • with_pipe do |r,w|

  •  pid = fork do
    
  •    $stdin.reopen(r)
    
  •    r.close
    
  •    read_thread = Thread.new do
    
  •      begin
    
  •        $stdin.read(1)
    
  •      rescue => e
    
  •        e
    
  •      end
    
  •    end
    
  •    sleep(0.1) until read_thread.stop?
    
  •    $stdin.close
    
  •    read_thread.join
    
  •    exit(IOError === read_thread.value)
    
  •  end
    
  •  assert Process.waitpid2(pid)[1].success?
    
  • end

  • rescue NotImplementedError

  • end

    end

    Eric Wong
    =end

#6 Updated by Eric Wong about 3 years ago

=begin
KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:

After while thinking, I conclude I was wrong. If rbiofptr_cleanup()
raise a exception, We don't have to kill other threads. So, now I'm
incline to revert r31230. Hmm...

I think we can still fix the issues revolving around r31230. We've
already found at least two (this and #4555) bugs. As for #4555
(connect() failing on EINTR) it was a bug anyways, but just very hard to
expose before.

--
Eric Wong
=end

#7 Updated by Eric Wong about 3 years ago

=begin
Eric Wong normalperson@yhbt.net wrote:

Thanks for testing, think I have a better fix below (supercedes my
original fix)

Also pushed to the "io-close-fixes" branch of git://bogomips.org/ruby.git

Oops, I broke test/ruby/test_thread.rb in the atexit handlers :x
I squashed the following change and forcibly repushed:

diff --git a/io.c b/io.c
index 5d37b7f..65e7693 100644
--- a/io.c
+++ b/io.c
@@ -3562,7 +3562,8 @@ fptrfinalize(rbio_t *fptr, int noraise)

      fptr->stdio_file = 0;
      fptr->fd = -1;
  • rbthreadfd_close(fd);
  • if (!noraise)
  • rbthreadfdclose(fd); goto skipfdclose; } if (fptr->stdiofile) { -- Eric Wong =end

#8 Updated by Tomoyuki Chikanaga about 3 years ago

=begin
Hi Eric,

These patches seem good and after applying them, make test-all passes all tests except a test for parallel_test in my environment. thanks!
=end

#9 Updated by Eric Wong about 3 years ago

=begin
Tomoyuki Chikanaga redmine@ruby-lang.org wrote:

These patches seem good and after applying them, make test-all passes
all tests except a test for parallel_test in my environment. thanks!

Thanks for reporting and helping us track it down.
Can somebody please commit my patches?

--
Eric Wong
=end

#10 Updated by Motohiro KOSAKI about 3 years ago

  • Status changed from Open to Closed

=begin
r31230 was revered by r31261.

=end

#11 Updated by Motohiro KOSAKI about 3 years ago

  • ruby -v changed from ruby 1.9.3dev (2011-04-05 trunk 31241) [x86_64-darwin10.6.0] to -

=begin

Subject: [PATCH] io.c (rbioclose): ensure IOError for cross-thread closes

We need to inform threads to stop operations on the FD before
closing it and also invalidate the fd member of the rbiot
struct for other threads to properly raise IOError.

FDs may be created and destroyed without the GVL, so
rbthreadfd_close() may be improperly hitting the wrong
threads/FDs if we close() before notifying and in the worst case
or threads will end up reading/writing to an unexpected FD.

ref:

ref: http://redmine.ruby-lang.org/issues/4558

 io.c                 |   25 ++++++++++++++++++-------
 test/ruby/test_io.rb |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/io.c b/io.c
index 7ce7148..5d37b7f 100644
--- a/io.c
+++ b/io.c
@@ -3504,6 +3504,7 @@ maygvl_close(int fd, int keepgvl)
    if (keepgvl)
       return close(fd);

  •    rbthreadfdclose(fd);
        /*
         * close() may block for certain file types (NFS, SO
    LINGER sockets,
         * inotify), so let other threads run.
    @@ -3525,6 +3526,8 @@ maygvl_fclose(FILE *file, int keepgvl)
        if (keepgvl)
           return fclose(file);

  •    rbthreadfdclose(fileno(file));
    +
        return (int)rb
    threadblockingregion(nogvlfclose, file, RUBYUBF_IO, 0);
     }

@@ -3555,24 +3558,35 @@ fptrfinalize(rbiot *fptr, int noraise)
       }
    }
    if (IS
PREP_STDIO(fptr) || fptr->fd <
=end

#12 Updated by Usaku NAKAMURA about 3 years ago

  • Category changed from core to test
  • Status changed from Closed to Assigned
  • Assignee set to Motohiro KOSAKI

=begin
I have no opinion about this topic, but the test code which was checked in at r31260 by kosaki-san is platform dependent.
It blocks on Windows, and stops all tests.

I request to revert it.
Or, please explain grounds from which this test should be accepted as behavior of ruby.

=end

#13 Updated by Eric Wong about 3 years ago

=begin
I consider either Errno::EBADF or IOError to be acceptable.

The main thing I care about is I/O for pipes/sockets being interruptable
(I only work on *nix).

By the way, test/socket/test_socket.rb has had a similar test for months
(since r30852). It does have a Timeout wrapping it, so maybe that is
needed (but you'd still get an error). Maybe just disabling this test
for Windows platforms would be acceptable?

Eventually I would like to get rid of places where we call select()
before doing (any) I/O across the board (ref: ) if
possible.

=end

#14 Updated by Motohiro KOSAKI about 3 years ago

=begin
2011/4/12 redmine@ruby-lang.org:

Issue #4558 has been updated by Usaku NAKAMURA.

Category changed from core to test
Status changed from Closed to Assigned
Assignee set to Motohiro KOSAKI

I have no opinion about this topic, but the test code which was checked in at r31260 by kosaki-san is platform dependent.
It blocks on Windows, and stops all tests.

I request to revert it.
Or, please explain grounds from which this test should be accepted as behavior of ruby.

I succuseeded to reporoduce this issue. On win32, IO.close() cause hang-up.
So, I think we have to discuss two thing.
1) Why close() makes hang-up? Is it acceptable behavior?
2) At , We decided IO.close() raise exception to
othread threads
and then they should wake up as ruby-1.8.
Should we think win32 is exception for this rule?

In the other words, We have to decided rbthreadioblockingregion()
spec. Otherwise
we can't make any testcase. ;-)
=end

#15 Updated by Motohiro KOSAKI about 3 years ago

=begin

Issue #4558 has been updated by Eric Wong.

I consider either Errno::EBADF or IOError to be acceptable.

Hmm...
I can't agree this. If EBADF can be observed, we can observe completely
unrelated file when a fd number was recycled just after close.

The main thing I care about is I/O for pipes/sockets being interruptable
(I only work on *nix).

By the way, test/socket/test_socket.rb has had a similar test for months
(since r30852).   It does have a Timeout wrapping it, so maybe that is
needed (but you'd still get an error).  Maybe just disabling this test
for Windows platforms would be acceptable?

Hmm...
If windows can't implement close() case, I doubt r30852 is correct fix.
Is this really worth that *nix and windows have different spec?

Eventually I would like to get rid of places where we call select()
before doing (any) I/O across the board (ref: ) if
possible.

this makes sense to me.
=end

#16 Updated by Usaku NAKAMURA about 3 years ago

=begin
Hello,

In message " Re: [Ruby 1.9 - Bug #4558][Assigned] TestSocket#testclosedread fails after r31230"
on Apr.12,2011 21:31:46, kosaki.motohiro@gmail.com wrote:

Or, please explain grounds from which this test should be accepted as behavior of ruby.

I succuseeded to reporoduce this issue. On win32, IO.close() cause hang-up.
So, I think we have to discuss two thing.
1) Why close() makes hang-up? Is it acceptable behavior?

MSVCRT's fds have their own locks.
MSVCRT locks fds when accessing them -- reading, writing,
closing, etc.
The author of MSVCRT obviously intended the behavior, I think.

2) At , We decided IO.close() raise exception to
othread threads
and then they should wake up as ruby-1.8.
Should we think win32 is exception for this rule?

I see. Hmm...

Is the behavior that close() doesn't block and the I/O operations
of other threads are interrupted defind by posix or some specifications?
We found this problem in Windows this time, but might there be
other platforms which have same problem?

Regards,
--
U.Nakamura usa@garbagecollect.jp
=end

#17 Updated by Motohiro KOSAKI about 3 years ago

=begin
Hi

2011/4/13 U.Nakamura usa@garbagecollect.jp:

Hello,

In message " Re: [Ruby 1.9 - Bug #4558][Assigned] TestSocket#testclosedread fails after r31230"
   on Apr.12,2011 21:31:46, kosaki.motohiro@gmail.com wrote:

Or, please explain grounds from which this test should be accepted as behavior of ruby.

I succuseeded to reporoduce this issue. On win32, IO.close() cause hang-up.
So, I think we have to discuss two thing.
 1) Why close() makes hang-up? Is it acceptable behavior?

MSVCRT's fds have their own locks.
MSVCRT locks fds when accessing them -- reading, writing,
closing, etc.
The author of MSVCRT obviously intended the behavior, I think.

ok, I see.

 2) At , We decided IO.close() raise exception to
othread threads
     and then they should wake up as ruby-1.8.
     Should we think win32 is exception for this rule?

I see.  Hmm...

Is the behavior that close() doesn't block and the I/O operations
of other threads are interrupted defind by posix or some specifications?

No. It's purely implementation defined.

We found this problem in Windows this time, but might there be
other platforms which have same problem?

It's possible.
So, now I'm incline to revert r30852.

nobu, What do you think?
=end

#18 Updated by Eric Wong about 3 years ago

=begin
KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:

Issue #4558 has been updated by Eric Wong.

I consider either Errno::EBADF or IOError to be acceptable.

Hmm...
I can't agree this. If EBADF can be observed, we can observe completely
unrelated file when a fd number was recycled just after close.

Actually, I expect EBADF make any read/write-retry loop stop
immediately, but yes, exposing IOError to user is better.

Hmm...
If windows can't implement close() case, I doubt r30852 is correct fix.
Is this really worth that *nix and windows have different spec?

If r30852 is reverted, Linux (and maybe other nix) will still break
threads out of blocking read/write with EBADF and
rbiowait_
able(fptr->fd) will raise IOError as long as we
assign fptr->fd = -1 before the actual close() call in IO#close.

Maybe Windows (and possibly other OS) will be forced to call do_select()
to implement behavior consistent with Linux.

On a related note, r30852 also has the problem with making IO#close an
O(n) operation since it needs to scan through all threads (and I'd like
to run thousands of threads :>).

--
Eric Wong
=end

#19 Updated by Eric Wong about 3 years ago

=begin
KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:

KOSAKI Motohiro kosaki.motohiro@gmail.com wrote:

Issue #4558 has been updated by Eric Wong.
following scenario should be happen too.

CPU1 CPU2 CPU3

open() -> 5
close(5)
open() -> 5
read(5) -> success, but read different data.

OK, I'm starting to think there's no safe way to handle these
situations, especially with MVM on the horizon. Just telling users to
stop doing close() in a different thread is probably the best way to
go...

Hmm...
If windows can't implement close() case, I doubt r30852 is correct fix.
Is this really worth that *nix and windows have different spec?

If r30852 is reverted, Linux (and maybe other nix) will still break
threads out of blocking read/write with EBADF and
rbiowait_
able(fptr->fd) will raise IOError as long as we
assign fptr->fd = -1 before the actual close() call in IO#close.

Maybe Windows (and possibly other OS) will be forced to call do_select()
to implement behavior consistent with Linux.

???
I'm sorry, I haven't catch your point.
Which issue is solved by calling do_select()?

It can reduce the likelyhood of read() being uninterruptable since
do_select() will sleep in 100ms intervals to check for interrupts on
win. There's still a small chance of blocking read() since select()
can have false positives and other threads could've drained the data...

On a related note, r30852 also has the problem with making IO#close an
O(n) operation since it needs to scan through all threads (and I'd like
to run thousands of threads :>).

I have no opinion. I like faster software, but I haven't seen close
makes performance bottleneck.

Contrived test case, but it gets worse as nr_thread increases:

# ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]
0.010000 0.000000 0.010000 ( 0.007288)

# ruby 1.9.3dev (2011-04-14 trunk 31267) [x8664-linux]
0.030000 0.000000 0.030000 ( 0.033881)
-----------------------------8< -------------------------
require "benchmark"
nr
thread = 1000
nr
close = 1_000

threads = nrthread.times.map { Thread.new { sleep } }
puts(Benchmark.measure do
nr
close.times do
File.open(FILE).close
end
end)
threads.each { |thr| thr.run }.each { |thr| thr.join }
--
Eric Wong
=end

#20 Updated by Motohiro KOSAKI almost 3 years ago

  • Status changed from Assigned to Closed

1) Windows test case failure have already been fixed.
2) "release GVL on close" is now discussing #4570.

So, we can close this ticket.

Also available in: Atom PDF