Project

General

Profile

Bug #14108

Seg Fault with MinGW on svn 60769

Added by MSP-Greg (Greg L) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0dev (2017-11-15 trunk 60767) [x64-mingw32]
[ruby-core:83772]

Description

The Noon JST ruby-loco MinGW build just failed. Last build on svn 60767 was successful.

Error shown in log:

linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
==> ERROR: A failure occurred in build().
    Aborting...
Command exited with code 1

From a local build with no make -j parameter:

linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
==> ERROR: A failure occurred in build().
    Aborting...

Thanks, Greg

Associated revisions

Revision 5570dba9
Added by usa (Usaku NAKAMURA) over 1 year ago

nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60770 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 60770
Added by usa (Usaku NAKAMURA) over 1 year ago

nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

Revision 60770
Added by usa (Usaku NAKAMURA) over 1 year ago

nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

Revision 60770
Added by usa (Usaku NAKAMURA) over 1 year ago

nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

Revision e9614f9a
Added by usa (Usaku NAKAMURA) over 1 year ago

Cannot call rb_thread_call_without_gvl before running VM

  • dir.c (readdir_without_gvl): check the VM is already initialized before calling rb_thread_call_without_gvl(). [Bug #14108]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@60772 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 60772
Added by usa (Usaku NAKAMURA) over 1 year ago

Cannot call rb_thread_call_without_gvl before running VM

  • dir.c (readdir_without_gvl): check the VM is already initialized before calling rb_thread_call_without_gvl(). [Bug #14108]

Revision 60772
Added by usa (Usaku NAKAMURA) over 1 year ago

Cannot call rb_thread_call_without_gvl before running VM

  • dir.c (readdir_without_gvl): check the VM is already initialized before calling rb_thread_call_without_gvl(). [Bug #14108]

Revision 60772
Added by usa (Usaku NAKAMURA) over 1 year ago

Cannot call rb_thread_call_without_gvl before running VM

  • dir.c (readdir_without_gvl): check the VM is already initialized before calling rb_thread_call_without_gvl(). [Bug #14108]

History

#1

Updated by usa (Usaku NAKAMURA) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset trunk|r60770.


nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

Updated by usa (Usaku NAKAMURA) over 1 year ago

I've just committed a workaround for now, because this hides other failures and errors on CI.

Updated by normalperson (Eric Wong) over 1 year ago

usa@garbagecollect.jp wrote:

I've just committed a workaround for now, because this hides other failures and errors on CI.

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

Updated by usa (Usaku NAKAMURA) over 1 year ago

Hi, Eric

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

Regards,
--
U.Nakamaura usa@garbagecollect.jp

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

On 2017/11/15 13:33, U.NAKAMURA wrote:

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.
In mswin/mingw version, command line globbing runs before VM initialization.
GET_EC() returns NULL and GET_THREAD() segfaults in call_without_gvl().

--
Nobu Nakada

Updated by normalperson (Eric Wong) over 1 year ago

"U.NAKAMURA" usa@garbagecollect.jp wrote:

Hi, Eric

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, normalperson@yhbt.net wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

Updated by usa (Usaku NAKAMURA) over 1 year ago

Hi, Eric,

In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, normalperson@yhbt.net wrote:

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

As Nobu said at ruby-core:83776, my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

So, IMO you don't need to revert them.
(but this mail is too late...)

Regards,
--
U.Nakamaura usa@garbagecollect.jp

Updated by normalperson (Eric Wong) over 1 year ago

"U.NAKAMURA" usa@garbagecollect.jp wrote:

Hi, Eric,

In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, normalperson@yhbt.net wrote:

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

As Nobu said at ruby-core:83776, my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.

Right, that was a real bug, too.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.

OK, but do we know all platforms? Reading glibc, it seems
readdir(3) is thread-safe (but not reentrant) at dirstream level
(see below)

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

So, IMO you don't need to revert them.
(but this mail is too late...)

Linux readdir(3) man-page says:

In the current POSIX.1 specification (POSIX.1-2008), readdir() is not
required to be thread-safe. However, in modern implementations
(including the glibc implementation), concurrent calls to readdir()
that specify different directory streams are thread-safe. In cases
where multiple threads must read from the same directory stream, using
readdir() with external synchronization is still preferable to the use
of the deprecated readdir_r(3) function. It is expected that a future
version of POSIX.1 will require that readdir() be thread-safe when con‐
currently employed on different directory streams.

So based on that paragraph alone, my revert seems correct.
Ruby Dir#read method can be called in multiple threads on the
same Dir object. That is dangerous.

However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].

So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?

[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]

Updated by normalperson (Eric Wong) over 1 year ago

Eric Wong normalperson@yhbt.net wrote:

However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].

So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?

No on whitelist. Actual dirent returned by readdir can still
be clobbered despite that lock. That glibc lock is only to protect
internal glibc structures, not dirent data the user sees.
GVL to protect readdir is probably still the best way going
forward. We may introduce new lock(s) (while releasing GVL),
but it's too complex and late for 2.5.

[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]

Updated by normalperson (Eric Wong) over 1 year ago

Btw, reiterating what I wrote privately, the correct way
to do this would be:

Dir#read:
release GVL
acquire dir_data.lock
readdir() call /* dirent pointer returned /
copy dirent result to tmp
release dir_data.lock /
done using dirent */
acquire GVL
make String from tmp

Too much engineering effort and the performance hit would suck
(even more than it does now).

We may consider using non-portable getdents/getdents64 syscalls on
some platforms (Linux, FreeBSD) directly for Ruby 2.6/3.x. That
would allow us to avoid introducing dir_data.lock and extra copy,
but it's too late for 2.5.

readdir(3) is high-level stdio crap; we've already purged most
of the stdio overhead in io.c for 1.9+; so maybe we can continue
that trend in dir.c. Of course, read(2)/write(2) are portable
while getdents(2) isn't :<

However, opendir can safely release GVL we should be able to
get that into 2.5...

Also available in: Atom PDF