Bug #14108
closedSeg Fault with MinGW on svn 60769
Description
The Noon JST ruby-loco MinGW build just failed. Last build on svn 60767 was successful.
Error shown in log:
linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
==> ERROR: A failure occurred in build().
    Aborting...
Command exited with code 1
From a local build with no make -j parameter:
linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
[1m[31m==> ERROR:[0;10m[1m A failure occurred in build().[0;10m
[1m    Aborting...[0;10m
Thanks, Greg
        
           Updated by usa (Usaku NAKAMURA) almost 8 years ago
          Updated by usa (Usaku NAKAMURA) almost 8 years ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset trunk|r60770.
nogvl readdir make SEGV on Windows
- dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]
        
           Updated by usa (Usaku NAKAMURA) almost 8 years ago
          Updated by usa (Usaku NAKAMURA) almost 8 years ago
          
          
        
        
      
      I've just committed a workaround for now, because this hides other failures and errors on CI.
        
           Updated by normalperson (Eric Wong) almost 8 years ago
          Updated by normalperson (Eric Wong) almost 8 years ago
          
          
        
        
      
      usa@garbagecollect.jp wrote:
I've just committed a workaround for now, because this hides other failures and errors on CI.
Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.
        
           Updated by usa (Usaku NAKAMURA) almost 8 years ago
          Updated by usa (Usaku NAKAMURA) almost 8 years ago
          
          
        
        
      
      Hi, Eric
In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:
Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.
I've not checked this problem deeply yet.
You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.
Regards,¶
U.Nakamaura usa@garbagecollect.jp
        
           Updated by nobu (Nobuyoshi Nakada) almost 8 years ago
          Updated by nobu (Nobuyoshi Nakada) almost 8 years ago
          
          
        
        
      
      On 2017/11/15 13:33, U.NAKAMURA wrote:
In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.I've not checked this problem deeply yet.
In mswin/mingw version, command line globbing runs before VM initialization.
GET_EC()returns NULL andGET_THREAD()segfaults incall_without_gvl().
--
Nobu Nakada
        
           Updated by normalperson (Eric Wong) almost 8 years ago
          Updated by normalperson (Eric Wong) almost 8 years ago
          
          
        
        
      
      "U.NAKAMURA" usa@garbagecollect.jp wrote:
Hi, Eric
In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.I've not checked this problem deeply yet.
You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.
Right, enc is no problem.  The problem is using DIR * from stdio
is not reentrant or thread-safe.  readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.
Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".
I will revert r60772, r60770, and r60769 for now...
        
           Updated by usa (Usaku NAKAMURA) almost 8 years ago
          Updated by usa (Usaku NAKAMURA) almost 8 years ago
          
          
        
        
      
      Hi, Eric,
In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, normalperson@yhbt.net wrote:
You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.
As Nobu said at [ruby-core:83776] (thank you Nobu!), my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.
Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.
Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.
Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".I will revert r60772, r60770, and r60769 for now...
So, IMO you don't need to revert them.
(but this mail is too late...)
Regards,¶
U.Nakamaura usa@garbagecollect.jp
        
           Updated by normalperson (Eric Wong) almost 8 years ago
          Updated by normalperson (Eric Wong) almost 8 years ago
          
          
        
        
      
      "U.NAKAMURA" usa@garbagecollect.jp wrote:
Hi, Eric,
In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, normalperson@yhbt.net wrote:You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.As Nobu said at [ruby-core:83776] (thank you Nobu!), my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.
Right, that was a real bug, too.
Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.
OK, but do we know all platforms?  Reading glibc, it seems
readdir(3) is thread-safe (but not reentrant) at dirstream level
(see below)
Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".I will revert r60772, r60770, and r60769 for now...
So, IMO you don't need to revert them.
(but this mail is too late...)
Linux readdir(3) man-page says:
In  the  current POSIX.1 specification (POSIX.1-2008), readdir() is not
required  to  be  thread-safe.   However,  in  modern   implementations
(including  the  glibc  implementation),  concurrent calls to readdir()
that specify different directory streams  are  thread-safe.   In  cases
where  multiple threads must read from the same directory stream, using
readdir() with external synchronization is still preferable to the  use
of  the deprecated readdir_r(3) function.  It is expected that a future
version of POSIX.1 will require that readdir() be thread-safe when con‐
currently employed on different directory streams.
So based on that paragraph alone, my revert seems correct.
Ruby Dir#read method can be called in multiple threads on the
same Dir object.  That is dangerous.
However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].
So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?
[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]
        
           Updated by normalperson (Eric Wong) almost 8 years ago
          Updated by normalperson (Eric Wong) almost 8 years ago
          
          
        
        
      
      Eric Wong normalperson@yhbt.net wrote:
However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?
No on whitelist.   Actual dirent returned by readdir can still
be clobbered despite that lock.  That glibc lock is only to protect
internal glibc structures, not dirent data the user sees.
GVL to protect readdir is probably still the best way going
forward.  We may introduce new lock(s) (while releasing GVL),
but it's too complex and late for 2.5.
[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]
        
           Updated by normalperson (Eric Wong) almost 8 years ago
          Updated by normalperson (Eric Wong) almost 8 years ago
          
          
        
        
      
      Btw, reiterating what I wrote privately, the correct way
to do this would be:
Dir#read:
release GVL
acquire dir_data.lock
readdir() call /* dirent pointer returned /
copy dirent result to tmp
release dir_data.lock / done using dirent */
acquire GVL
make String from tmp
Too much engineering effort and the performance hit would suck
(even more than it does now).
We may consider using non-portable getdents/getdents64 syscalls on
some platforms (Linux, FreeBSD) directly for Ruby 2.6/3.x.  That
would allow us to avoid introducing dir_data.lock and extra copy,
but it's too late for 2.5.
readdir(3) is high-level stdio crap; we've already purged most
of the stdio overhead in io.c for 1.9+; so maybe we can continue
that trend in dir.c.  Of course, read(2)/write(2) are portable
while getdents(2) isn't :<
However, opendir can safely release GVL we should be able to
get that into 2.5...