Bug #6358: arm-linux : sleep() time dependent threading bug - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #6358

closed

arm-linux : sleep() time dependent threading bug

Added by stevegoobermanhill (stephen gooberman-hill) about 13 years ago. Updated over 12 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 1.9.2p136 (2010-12-25 revision 30365) [arm-linux]

Backport:

[ruby-core:44621]

Description

Hi,
I have a multi-threading bug on a low-speed (180MHz) arm-linux platform. The bug can be consistantly produced or removed by varying the length of a sleep() interval at the end of a piece of code invoked in a Thread (the GServer#serve(io) method)

I am unable to completely isolate the bug, and the trace I get off it is intermittent, but when it occurs it terminates the program instantly. The debugging info that I get when the program fails is dependent on the sleep timer.

I have once seen a ruby fault, with a control frame backtrace (3 levels) showing a cfp consistancy fault (sorry - not sure which one, but send I think - my fault for losing the trace). However, I can consistently get a trace claiming a ThreadError

The lines referenced are marked in the code.

OK - the following piece of code (log statements removed) is taken from a class derived from GServer, and implements the #serve method. The line marked with the arrow (==>) at the bottom is critical in controlling the appearance of the bug.

The bug can be made to appear by running a sequence of unit tests to stress this piece of code (there are 5 test methods in the sequence)

If the sleep length is 0.01, the program (a test suite) will crash on the first test, with no backtrace or debugging info
If the sleep length is 0.1, or if the sleep length is 0.01 AND ruby is invoked with --debug then the program will crash at a variable point
in the test sequence, giving the following trace
Exception ThreadError' at /mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:88 - must be called with a block must be called with a block Exception ThreadError' at /mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:118 - must be called with a block
/mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:88:in initialize' /mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:88:in eof'
/mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:88:in `serve'
This sleep length has also (once) given some control frame information.
If the sleep length is 0.5, the test sequence will always run correctly.

I suspect that the issue is somewhere deep in the Thread handling code - possibly the ruby interpreter is not giving the underlying linux system enough time to clear up and reallocate threads; this results in the crash.

Hope this is of some help. I do have a workaround at this point (lengthen the timer to 0.5s), but this may be an issue on slower platforms.

Kind regards

Steve

====================== simplified code (/mnt/nand-user/trident_v1.1_alpha/lib/rmodbus/tcp_multi_server.rb:88) ===================
====================== lines referred to in error are commented ===================
====================== sleep statement which can trigger or remove the bug marked with ==> ===================

class TCPMultiServer < GServer
def serve(io)
begin
io.sync=true
io.fcntl(Fcntl::F_SETFL, io.fcntl(Fcntl::F_GETFL) | Fcntl::O_NONBLOCK)

    while not stopped? 
      req=nil
      unless io.eof                                                           #line 88 (referenced in ThreadError)
        req = io.read(7) 
                  
        destination=req.getbyte(6)
        multicast= ( destination == 0 ) #true if multicast message
        
        active_devices= multicast ? @devices.values : [@devices[destination]]
        active_devices.compact!
        
        tr = req[0,2]
        len = req[4,2].unpack('n')[0]
        req = io.read(len - 1) unless io.eof
        
        active_devices.each do |device|    
          pdu = device.serve(req)

          resp = tr + "\0\0" + (pdu.size + 1).to_word + device.uid.chr + pdu
          io.write resp
        end
      end

==> sleep(0.01)
end
rescue Exception => ex
STDERR.puts ex
raise ex #line 118 (referenced in ThreadError)
end
end
end

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #6358

arm-linux : sleep() time dependent threading bug

Updated by mame (Yusuke Endoh) about 13 years ago

Updated by stevegoobermanhill (stephen gooberman-hill) about 13 years ago

Updated by naruse (Yui NARUSE) about 13 years ago

Updated by stevegoobermanhill (stephen gooberman-hill) about 13 years ago

Updated by mame (Yusuke Endoh) over 12 years ago