Backport #3840

Ruby 1.9.2p0 crashes using Rails 3.0 on repeated browser refresh

Added by Sean O'Neil over 3 years ago. Updated over 2 years ago.

[ruby-core:32428]
Status:Assigned
Priority:Normal
Assignee:Yuki Sonoda

Description

=begin
I'm not certain whether this is a Ruby issue or a Rails issue (or a bit of both). The problem is intermittent, and it appears to be a threading issue surrounding the reading of the Rails log file. It is fairly easy for me to reproduce the problem:

1) Install a clean 1.9.2p0 and run "gem install rails" to get Rails 3.0.
2) Use the Rails 3.0 generator script to create an empty project.
3) Launch the web app using default options.
4) Point a browser to the home page for the new Rails app, and keep pressing F5 (or Refresh) at varying speeds until Ruby crashes. It seems to happen most often if I hold F5 for a few seconds and then release for a few seconds, varying how long I hold and release it.

It crashes eventually no matter how I run it, but it seems to crash more often in production mode (which logs a bunch of errors using the default generated app). The crash appears to be dependent on timing, but I can usually make it crash in less than 60 seconds.

I'm currently using Windows 7 (x64) and compiling Ruby with Visual Studio 2008. However, the crash also occurs with the pre-packaged RubyInstaller for Windows. Given the debug information I've provided below, it does not appear to be platform-specific, but a different OS can have different IO and thread timing issues, so it may be harder to reproduce on other platforms.

I rebuilt Ruby with debugging options and got this call stack when it crashed:
rbeconvconvert() in transcode.c line 1449 (the ec parameter was set to NULL)
fillcbuf() in io.c line 1691
read
all() in io.c line 1755
io_read() in io.c line 2163

I added some additional checks to the code to find out how fptr->readconv was getting cleared inside fillcbuf(), and I traced it to iofillbuf() and then rbreadinternal(). Now, rbreadinternal() does not have a pointer to fptr or any of its members. It could be a buffer overrun, but the buffer is much larger than the file, and the changes seem much too clean for that. In addition to readconv being cleared, other struct members are being reset to an earlier state (not necessarily back to 0), which seems to imply that another thread is accessing the same IO object while the current thread is inside rbreadinternal(). I can't "catch it in the act" because setting breakpoints affects the timing, and it typically requires hundreds of passes through this code to break. However, I can copy the struct fptr is pointing to and have it break as soon as those members change. The info being read right before the IO object is reset is:

rbuf 0x05700408 "
Started GET "/javascripts/dragdrop.js?1283353508" for 127.0.0.1 at 2010-09-08 16:22:14 -0400
ActionController::RoutingError (No route matches "/javascripts/dragdrop.js"):

Rendered d:/Projects/NetAuditor/v3.00/Code/Build/lib/ruby/gems/1.9.1/gems/actionpack-3.0.0/lib/actiondispatch/middleware/templates/rescues/routingerror.erb within rescues/layout (2.0ms)

"

I can eliminate this crash by changing the Ruby source to check for NULL before referencing the pointer. However, it may only hide the issue, which may come back to bite me later. So, does this sound like a problem that needs to be fixed in the Ruby code, or a thread-safety issue in Rails?

Thank you,
Sean O'Neil
=end

History

#1 Updated by Nobuyoshi Nakada over 3 years ago

=begin
Hi,

At Thu, 16 Sep 2010 21:40:03 +0900,
Sean O'Neil wrote in :

I can eliminate this crash by changing the Ruby source to
check for NULL before referencing the pointer. However, it
may only hide the issue, which may come back to bite me
later. So, does this sound like a problem that needs to be
fixed in the Ruby code, or a thread-safety issue in Rails?

Your investigation seems correct, and I agree that it should be
noticed earlier. I think Rails has a thread-safety issue but
the crash is definitely a bug of Ruby itself. Does this patch
fix it?

diff --git a/io.c b/io.c
index 4b8d34a..67d97f2 100644
--- a/io.c
+++ b/io.c
@@ -1722,6 +1722,7 @@ fillcbuf(rbiot *fptr, int ecflags)
if (fptr->rbuflen == 0) {
READ
CHECK(fptr);
if (iofillbuf(fptr) == -1) {
+ rb
iocheckclosed(fptr);
ds = dp = (unsigned char *)fptr->cbuf + fptr->cbufoff + fptr->cbuflen;
de = (unsigned char *)fptr->cbuf + fptr->cbufcapa;
res = rb
econv_convert(fptr->readconv, NULL, NULL, &dp, de, 0);

--
Nobu Nakada

=end

#2 Updated by Ingmar Stieger about 3 years ago

=begin
I have exactly the same problem as Sean, and the call to rbiocheck_closed(fptr) does not help. In my case, the crash happens on 2-3 page reloads, so almost instantly.

What does help is this crude fix:

--- a/transcode.c 2010-06-17 09:55:46 +0000
+++ b/transcode.c 2011-01-25 17:08:01 +0000
@@ -1446,6 +1446,12 @@
unsigned char emptybuf;
unsigned char *empty
ptr = &empty_buf;

  • if (!ec)
  • {
  • rbraise(rbeUndefinedConversionError, "ec is null (ing)");
  • return(econvundefinedconversion);
  • }
    +
    ec->started = 1;

    if (!input_ptr) {

    This results in the following console message:

    [2011-01-25 18:13:17] ERROR Encoding::UndefinedConversionError: ec is null (ing)
    c:/Ruby192/lib/ruby/gems/1.9.1/gems/railties-3.0.1/lib/rails/rack/logtailer.rb:23:in read'
    c:/Ruby192/lib/ruby/gems/1.9.1/gems/railties-3.0.1/lib/rails/rack/log_tailer.rb:23:in
    tail!'
    c:/Ruby192/lib/ruby/gems/1.9.1/gems/railties-3.0.1/lib/rails/rack/log
    tailer.rb:15:in call'
    c:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.2.1/lib/rack/content_length.rb:13:in
    call'
    c:/Ruby192/lib/ruby/gems/1.9.1/gems/rack-1.2.1/lib/rack/handler/webrick.rb:52:in service'
    c:/Ruby192/lib/ruby/1.9.1/webrick/httpserver.rb:111:in
    service'
    c:/Ruby192/lib/ruby/1.9.1/webrick/httpserver.rb:70:in run'
    c:/Ruby192/lib/ruby/1.9.1/webrick/server.rb:183:in
    block in start_thread'

    At least it does not crash any more, though I am not sure what the side effects of this might be.

    Any ideas for a real solution ?

    Ingmar

=end

#3 Updated by Hiroshi Shirosaki almost 3 years ago

=begin
I wrote a test code.

Ruby raises "ec is null" error.

ruby 1.9.2p180 (2011-02-18 revision 30909) [i386-mingw32]

#!/usr/bin/env ruby

path = "test.txt"

file = File.open(path, "w")
file.close

threads = []

file = File.open(path, "a")
cursor = File.size(path)
read_file = File.open(path, "r")

1000.times do |i|
t = Thread.new do
file.print(i)
file.flush
end
threads << t

t = Thread.new do
content = readfile.read
cursor += content.size
read
file.seek cursor
print content
end
threads << t
end

threads.each { |t| t.join }

file.close
read_file.close

=end

#4 Updated by Stephen Bannasch almost 3 years ago

=begin
On Mac OS X 10.6.7 using ruby 1.9.2p180 your test appears to run correctly.
=end

#5 Updated by Ian Viney almost 3 years ago

=begin
I have been having the same problem with Rails (see (())). I tried an even simpler version of Hiroshi Shirosaki's test, and this fails repeatedly within 20 iterations. Usually, Ruby crashes with an exit code of -1073741819.

I am using Windows XP with Ruby 1.9.2-p180. It appears that this is a Windows-specific problem.

path = "test.txt"

file = File.open(path, "w")
read_file = File.open(path, "r")

1000.times do |i|
puts i
Thread.new { file.print(i) }
Thread.new { read_file.read }
end

file.close
read_file.close

=end

#6 Updated by Luis Lavena almost 3 years ago

=begin
Interesting.

Waiting for the threads to complete their operations avoids the segfault:

path = "test.txt"

file = File.open(path, "a")
read_file = File.open(path, "r")

1000.times do |i|
puts i
a = Thread.new { file.print(i) }
b = Thread.new { read_file.read }
[a, b].each { |t| t.join }
end

file.close
read_file.close

=end

#7 Updated by Hiroshi Shirosaki almost 3 years ago

=begin
Simpler version tests look good.
I tried, but Ruby didn't crash.

I am using Windows XP with Ruby 1.9.2-p180.

Modifying threads join, Ruby crashes with my environment.

path = "test.txt"

file = File.open(path, "w")
read_file = File.open(path, "r")

threads = []
1000.times do |i|
threads << Thread.new { file.print(i) }
threads << Thread.new { read_file.read }
end
threads.each { |t| t.join }

file.close
read_file.close

=end

#8 Updated by Luis Lavena almost 3 years ago

=begin
Hiroshi Shirosaki wrote:

Simpler version tests look good.
I tried, but Ruby didn't crash.

I am using Windows XP with Ruby 1.9.2-p180.

Modifying threads join, Ruby crashes with my environment.

It crash for me.

Now, something to point out is that multiple threads performing read operations over the same file descriptor without a mutex or a synchronization mechanism is set to fail.

Same for writing operations, these need to be queued/synchronized properly.

Threads are not warranted to be executed in sequence, so that could be one of the issues we are seeing here.

=end

#9 Updated by Hiroshi Shirosaki almost 3 years ago

This fix works well for me.
It seems that other thread calls clearreadconv in readall while io_fillbuf select waiting.

--- ruby-1.9.2-p180org/io.c 2011-01-16 21:35:27 +0900
+++ ruby-1.9.2-p180/io.c 2011-06-17 19:26:15 +0900
@@ -1689,6 +1689,9 @@
if (fptr->rbuf
len == 0) {
READCHECK(fptr);
if (io
fillbuf(fptr) == -1) {
+ if(!fptr->readconv) {
+ return MORECHARFINISHED;
+ }
ds = dp = (unsigned char *)fptr->cbuf + fptr->cbufoff + fptr->cbuflen;
de = (unsigned char *)fptr->cbuf + fptr->cbufcapa;
res = rb
econv_convert(fptr->readconv, NULL, NULL, &dp, de, 0);

#10 Updated by Luis Lavena almost 3 years ago

  • Target version set to 1.9.2

Hiroshi Shirosaki wrote:

This fix works well for me.
It seems that other thread calls clearreadconv in readall while io_fillbuf select waiting.

I can't no longer reproduce this on trunk (r32151):

1.rb:10:in new': failed to allocate memory (NoMemoryError)
from 1.rb:10:in
block in '
from 1.rb:7:in times'
from 1.rb:7:in
'

However, your patch seems to have solved the segfault.

#11 Updated by Nobuyoshi Nakada almost 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r32169.
Sean, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • io.c (fillcbuf): finish reading at EOF, and the readconv has been cleared by another thread while iofillbuf() is waiting at select(). a patch in by Hiroshi Shirosaki . fixed #3840

#12 Updated by Jeff Morgan over 2 years ago

Is there any chance this fix can be backported to 1.9.2? Without this fix ruby crashes very frequently when running a simple rails application on windows. Thanks!

#13 Updated by Usaku NAKAMURA over 2 years ago

  • Status changed from Closed to Assigned
  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport92
  • Assignee set to Yuki Sonoda
  • Target version deleted (1.9.2)

Also available in: Atom PDF