Project

General

Profile

Actions

Bug #14108

closed

Seg Fault with MinGW on svn 60769

Added by MSP-Greg (Greg L) over 6 years ago. Updated over 6 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0dev (2017-11-15 trunk 60767) [x64-mingw32]
[ruby-core:83772]

Description

The Noon JST ruby-loco MinGW build just failed. Last build on svn 60767 was successful.

Error shown in log:

linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
==> ERROR: A failure occurred in build().
    Aborting...
Command exited with code 1

From a local build with no make -j parameter:

linking miniruby.exe
generating encdb.h
make: *** [uncommon.mk:704: .rbconfig.time] Segmentation fault
make: *** Waiting for unfinished jobs....
encdb.h updated
[1m[31m==> ERROR:[0;10m[1m A failure occurred in build().[0;10m
[1m    Aborting...[0;10m

Thanks, Greg

Actions #1

Updated by usa (Usaku NAKAMURA) over 6 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r60770.


nogvl readdir make SEGV on Windows

  • dir.c (readdir_without_gvl): workaround for Windows. [Bug #14108]

Updated by usa (Usaku NAKAMURA) over 6 years ago

I've just committed a workaround for now, because this hides other failures and errors on CI.

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

I've just committed a workaround for now, because this hides other failures and errors on CI.

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

Updated by usa (Usaku NAKAMURA) over 6 years ago

Hi, Eric

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

Regards,

U.Nakamaura

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

On 2017/11/15 13:33, U.NAKAMURA wrote:

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.
In mswin/mingw version, command line globbing runs before VM initialization.
GET_EC() returns NULL and GET_THREAD() segfaults in call_without_gvl().

--
Nobu Nakada

Updated by normalperson (Eric Wong) over 6 years ago

"U.NAKAMURA" wrote:

Hi, Eric

In message "[ruby-core:83774] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 04:14:37 +0000, wrote:

Oops, I will need to modify (or revert) r60769 anyways.
It may be possible for concurrent threads to share Dir objects
and we were relying on GVL for protect pure-Ruby code from
stepping over each other.

I've not checked this problem deeply yet.

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

Updated by usa (Usaku NAKAMURA) over 6 years ago

Hi, Eric,

In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, wrote:

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

As Nobu said at [ruby-core:83776] (thank you Nobu!), my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

So, IMO you don't need to revert them.
(but this mail is too late...)

Regards,

U.Nakamaura

Updated by normalperson (Eric Wong) over 6 years ago

"U.NAKAMURA" wrote:

Hi, Eric,

In message "[ruby-core:83779] Re: [Ruby trunk Bug#14108] Seg Fault with MinGW on svn 60769"
on Wed, 15 Nov 2017 07:12:42 +0000, wrote:

You can see our implementation of readdir on Windows in
win32/win32.c (rb_w32_readdir).
I doubt the second argument (rb_encoding *enc) is the cause.

As Nobu said at [ruby-core:83776] (thank you Nobu!), my guess was wrong
and the cause was touching GVL before running VM.
Therefore r60770 is the right way to resolve the problem.

Right, that was a real bug, too.

Right, enc is no problem. The problem is using DIR * from stdio
is not reentrant or thread-safe. readdir_r is deprecated, even,
and so it is up to the application to provide thread-safety when
reading directories.

Our readdir implementation for Win32 is expected to be thread-safe.
And, it seems that there is no platform which uses non thread-safe
readdir.

OK, but do we know all platforms? Reading glibc, it seems
readdir(3) is thread-safe (but not reentrant) at dirstream level
(see below)

Usually on modern platforms (but not guaranteed), readdir on
different DIR * pointers is safe from multiple threads, so maybe
we can add a mutex to "struct dir_data".

I will revert r60772, r60770, and r60769 for now...

So, IMO you don't need to revert them.
(but this mail is too late...)

Linux readdir(3) man-page says:

In the current POSIX.1 specification (POSIX.1-2008), readdir() is not
required to be thread-safe. However, in modern implementations
(including the glibc implementation), concurrent calls to readdir()
that specify different directory streams are thread-safe. In cases
where multiple threads must read from the same directory stream, using
readdir() with external synchronization is still preferable to the use
of the deprecated readdir_r(3) function. It is expected that a future
version of POSIX.1 will require that readdir() be thread-safe when con‐
currently employed on different directory streams.

So based on that paragraph alone, my revert seems correct.
Ruby Dir#read method can be called in multiple threads on the
same Dir object. That is dangerous.

However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].

So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?

[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]

Updated by normalperson (Eric Wong) over 6 years ago

Eric Wong wrote:

However; when reading glibc source[1], I noticed every DIR stream has
a lock which guards readdir calls since 1996[2].

So maybe whitelist glibc (and maybe win32) implementations
for GVL release on readdir?

No on whitelist. Actual dirent returned by readdir can still
be clobbered despite that lock. That glibc lock is only to protect
internal glibc structures, not dirent data the user sees.
GVL to protect readdir is probably still the best way going
forward. We may introduce new lock(s) (while releasing GVL),
but it's too complex and late for 2.5.

[1] git://sourceware.org/git/glibc.git - sysdeps/posix/readdir.c
[2] commit c150923988933b5db75a974d4cc08cd7f7aaf3dc in glibc[1]

Updated by normalperson (Eric Wong) over 6 years ago

Btw, reiterating what I wrote privately, the correct way
to do this would be:

Dir#read:
release GVL
acquire dir_data.lock
readdir() call /* dirent pointer returned /
copy dirent result to tmp
release dir_data.lock /
done using dirent */
acquire GVL
make String from tmp

Too much engineering effort and the performance hit would suck
(even more than it does now).

We may consider using non-portable getdents/getdents64 syscalls on
some platforms (Linux, FreeBSD) directly for Ruby 2.6/3.x. That
would allow us to avoid introducing dir_data.lock and extra copy,
but it's too late for 2.5.

readdir(3) is high-level stdio crap; we've already purged most
of the stdio overhead in io.c for 1.9+; so maybe we can continue
that trend in dir.c. Of course, read(2)/write(2) are portable
while getdents(2) isn't :<

However, opendir can safely release GVL we should be able to
get that into 2.5...

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0