Bug #9480

Ruby on windows can get into an infinite loop when exiting

Added by Josh C over 1 year ago. Updated over 1 year ago.

[ruby-core:60449]
Status:Open
Priority:Normal
Assignee:Usaku NAKAMURA
ruby -v:ruby 1.9.3p484 (2013-11-22) [i386-mingw32] Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN

Description

The Ruby VM can get into an infinite loop when exiting. I have attached two ruby programs (main.rb and watcher.rb). Main.rb spawns watcher.rb, and the watcher waits for main to exit. However, sometimes main.rb will get into an infinite loop trying to shutdown the VM.

To reproduce:

I have tried this on ruby 1.9.3p484 from rubyinstaller.org, which is the latest 19 version to date.

C:>ruby --version
ruby 1.9.3p484 (2013-11-22) [i386-mingw32]
C:>gem list
bigdecimal (1.1.0)
io-console (0.3)
json (1.5.5)
minitest (2.5.1)
rake (0.9.2.2)
rdoc (3.9.5)

Execute main.rb in a loop, e.g. from powershell:

C:>for ($i = 0; $i -lt 1000; $i++) {
ruby main.rb
Start-Sleep 1
}

Sometimes I see the parent ruby process get into an infinite loop when trying to shutdown the ruby VM. If I remove the use of IO.pipe in main.rb, then the problem doesn't seem to happen. I created a full crash dump, and see the following, note thread b8c is running away:

0:000> !runaway
User Mode Time
Thread Time
0:b8c 0 days 0:00:11.528
2:564 0 days 0:00:00.000
1:a30 0 days 0:00:00.000

Looking at that thread's call stack:

0:000> kb
ChildEBP RetAddr Args to Child

0028fd98 76f00fa2 000000b0 00000000 0028fe38 ntdll!ZwReleaseMutant+0x15
0028fda8 62e6285e 000000b0 62e5e128 00355128 KERNELBASE!ReleaseMutex+0x10
WARNING: Stack unwind information not available. Following frames may be wrong.
0028fe38 62d425d9 0035524c 00000000 0028fec8 msvcrt_ruby191!rb_thread_terminate_all+0x15a
0028fec8 62d42710 00000000 00351390 003525b4 msvcrt_ruby191!ruby_cleanup+0x2c1
0028fef8 0040136f 02557998 00352c38 0028ff28 msvcrt_ruby191!ruby_run_node+0x50
0028ff28 004010b9 00000002 00352c38 003516b0 ruby+0x136f
0028ff68 00401284 00000001 00000000 00000000 ruby+0x10b9
0028ff88 753f33aa 7efde000 0028ffd4 773f9f72 ruby+0x1284
0028ff94 773f9f72 7efde000 77dccdc5 00000000 kernel32!BaseThreadInitThunk+0xe
0028ffd4 773f9f45 0040126c 7efde000 ffffffff ntdll!_RtlUserThreadStart+0x70
0028ffec 00000000 0040126c 7efde000 00000000 ntdll!
RtlUserThreadStart+0x1b

where handle b0 is free:

0:000> !handle b0 f
Handle 000000b0
Type Mutant
Attributes 0
GrantedAccess 0x1f0001:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState
HandleCount 2
PointerCount 4
Name <none>
Object specific information
Mutex is Free
Mutant Owner 0.0

Unfortunately, it's hard to tell from that stack trace if the handle was just freed, or was already free before ReleaseMutex was called. However, a stack trace from a different run shows something more interesting:

0:000> ~kb
ChildEBP RetAddr Args to Child

0028fc38 76d315e9 00000001 0028fc88 00000001 ntdll!NtWaitForMultipleObjects+0x15
0028fcd4 75b71a2c 0028fc88 0028fcfc 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
0028fd1c 75b74220 00000001 7efde000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0
0028fd38 62e5f913 00000001 0028fe1c 00000000 kernel32!WaitForMultipleObjects+0x18
WARNING: Stack unwind information not available. Following frames may be wrong.
0028fda8 62e6288b 00000000 62e5e128 0062c668 msvcrt_ruby191!rb_mutex_unlock+0x152b
0028fe38 62d425d9 0062c78c 00000000 0028fec8 msvcrt_ruby191!rb_thread_terminate_all+0x187
0028fec8 62d42710 00000006 006223c8 0062235c msvcrt_ruby191!ruby_cleanup+0x2c1
0028fef8 0040136f 015ad9b8 00622b70 0028ff28 msvcrt_ruby191!ruby_run_node+0x50
0028ff28 004010b9 00000005 00622b70 006214e8 ruby+0x136f
0028ff68 00401284 00000001 00000000 00000000 ruby+0x10b9
0028ff88 75b733aa 7efde000 0028ffd4 774b9f72 ruby+0x1284
0028ff94 774b9f72 7efde000 74b8f904 00000000 kernel32!BaseThreadInitThunk+0xe
0028ffd4 774b9f45 0040126c 7efde000 ffffffff ntdll!_RtlUserThreadStart+0x70
0028ffec 00000000 0040126c 7efde000 00000000 ntdll!
RtlUserThreadStart+0x1b

where the call to WaitForMultipleObjects has the following parameters:

DWORD nCount = 00000001
HANDLE *lpHandles = 0028fe1c
BOOL bWaitAll = 00000000
DWORD dwMilliseconds = ffffffff

where lpHandles points to 1c8 and the timeout is INFINITE. Looking at the handle, it is again free:

0:000> !handle 1c8 f
Handle 000001c8
Type Mutant
Attributes 0
GrantedAccess 0x1f0001:
Delete,ReadControl,WriteDac,WriteOwner,Synch
QueryState
HandleCount 2
PointerCount 4
Name <none>
Object specific information
Mutex is Free
Mutant Owner 0.0

But what is interesting is that the timeout is INFINITE, which makes me think ruby is trying to unlock a mutex that is already unlocked. In looking at the call stack, I see rb_thread_terminate_all which calls rb_threadptr_unlock_all_locking_mutexes:

void
rb_threadptr_unlock_all_locking_mutexes(rb_thread_t *th)
{
const char *err;
rb_mutex_t *mutex;
rb_mutex_t *mutexes = th->keeping_mutexes;

while (mutexes) {
mutex = mutexes;
/* rb_warn("mutex #<%p> remains to be locked by terminated thread",
    mutexes); */
mutexes = mutex->next_mutex;
err = rb_mutex_unlock_th(mutex, th);
if (err) rb_bug("invalid keeping_mutexes: %s", err);
}

}

Which makes me wonder if the thread's list of keeping_mutexes has a cycle?

main.rb Magnifier (435 Bytes) Josh C, 02/04/2014 11:55 PM

watcher.rb Magnifier (791 Bytes) Josh C, 02/04/2014 11:56 PM

History

#1 Updated by Josh C over 1 year ago

Also available in: Atom PDF