Bug #20863
open`zlib.c` calls `rb_str_set_len` and `rb_str_modify_expand`(and others) without holding the GVL.
Description
Background¶
I was working on https://bugs.ruby-lang.org/issues/20876 and was investigating some problems with zlib.c
and GVL, and noticed that zstream_run_func
is executed without the GVL, but can invoke various rb_
string functions. Those functions in turn can invoke rb_raise
and generally look problematic. However, maybe by luck, such code path does not appear to be invoked in typical usage.
However, even so, it is possible to cause zstream_run_func
to segfault by a carefully crafted program which causes the internal buffer to be resized while the GVL is released: https://github.com/ruby/zlib/pull/88#issuecomment-2455772054
Proposal¶
I would like to modify zlib.c
to only release the GVL around the CPU intensive compression/decompression operation: https://github.com/ruby/zlib/pull/88
In addition, I identified several more improvements to prevent segfaults and other related failures:
- Use
rb_str_locktemp
to prevent thez->buf
changing size while in use by therb_nogvl
code. - Expand the mutex to protect
#deflate
and#inflate
completely, not just the internal operation.
In order to catch these issues earlier and find other bugs like this, I recommend we introduce additional checks: https://bugs.ruby-lang.org/issues/20877
Updated by byroot (Jean Boussier) 11 days ago
@ko1 (Koichi Sasada) Do we have a proper description of what is safe and what it unsafe to do with the GVL released?
Because obviously it's OK to use ruby_xmalloc / ruby_xfree
with the GVL released, so methods which allocate aren't necessarily problematic?\
In this case I'm unclear on why rb_str_set_len
/ rb_str_modify_expand
shouldn't be called with the GVL released, assuming the objects on which they operate aren't visible to any other thread.
I think it would be helpful to have more clear guidelines on these things (unless of course I missed some existing documentation).
Updated by ko1 (Koichi Sasada) 11 days ago
Quoted from rb_thread_call_without_gvl
doc:
* NOTE: You can not execute most of Ruby C API and touch Ruby
* objects in `func()' and `ubf()', including raising an
* exception, because current thread doesn't acquire GVL
* (it causes synchronization problems). If you need to
* call ruby functions either use rb_thread_call_with_gvl()
* or read source code of C APIs and confirm safety by
* yourself.
*
* NOTE: In short, this API is difficult to use safely. I recommend you
* use other ways if you have. We lack experiences to use this API.
* Please report your problem related on it.
*
* NOTE: Releasing GVL and re-acquiring GVL may be expensive operations
* for a short running `func()'. Be sure to benchmark and use this
* mechanism when `func()' consumes enough time.
*
* Safe C API:
* * rb_thread_interrupted() - check interrupt flag
* * ruby_xmalloc(), ruby_xrealloc(), ruby_xfree() -
* they will work without GVL, and may acquire GVL when GC is needed.
Again:
* NOTE: In short, this API is difficult to use safely. I recommend you
* use other ways if you have. We lack experiences to use this API.
* Please report your problem related on it.
Updated by byroot (Jean Boussier) 11 days ago
@ko1 (Koichi Sasada) Not sure how I didn't think to check that, thank you. So indeed allocations are fine. From what I understand, the issue is mostly exceptions and of course using an object concurrently.
Updated by ioquatix (Samuel Williams) 11 days ago
I think the issue is, those methods from a public interface POV, are not allowed to be called without the GVL.
Even if today the implementation follows a "safe" code path, in the future, it may not.
Adding these annotations will help to clarify that "this method is not safe to call without the GVL" - a form of internal and run-time documentation.
Updated by Eregon (Benoit Daloze) 11 days ago ยท Edited
ioquatix (Samuel Williams) wrote in #note-7:
Even if today the implementation follows a "safe" code path, in the future, it may not.
This is a good point.
I think we should consider all C API functions unsafe to be called without the GVL, except the functions listed in Safe C API
.
So I think we should update the docs to remove or read source code of C APIs and confirm safety by yourself.
as it's not a good idea as it may change and it's very hard to assess if safe.
Updated by byroot (Jean Boussier) 11 days ago
There would be quite a lot of value in having some nogvl save APIs though. e.g. if database clients could allocate Hash/Array/String to build the response while the GVL is still released, it could really help with throughput of threaded servers like Puma.
Updated by ioquatix (Samuel Williams) 11 days ago
There would be quite a lot of value in having some nogvl save APIs though. e.g. if database clients could allocate Hash/Array/String to build the response while the GVL is still released, it could really help with throughput of threaded servers like Puma.
I think it's a great idea (seriously great), but out of scope for this issue. Do you want to create a new issue to start that discussion?