Bug #21342
closedSegfault: invalid keeping_mutexes when using Mutex in Thread then Fiber after GC
Description
Ruby crashes with a [BUG] invalid keeping_mutexes error
when attempting to GC locked mutex that was used in a Thread within a Fiber context after garbage collection. The error indicates an attempt to unlock a mutex that is not locked, suggesting a state management issue with mutexes across Thread and Fiber boundaries.
Ruby Version¶
ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]
Reproduce Process¶
# segv.rb
5.times do
m = Mutex.new
Thread.new do
m.synchronize do
end
end.join
Fiber.new do
GC.start
m.lock
end.resume
end
- Save the above code to a file (e.g.,
segv.rb
) - Run with
ruby segv.rb
- The crash occurs intermittently - sometimes it crashes immediately, sometimes it hangs, once in a while it works
Actual Result¶
The program crashes with the following error:
segv.rb: [BUG] invalid keeping_mutexes: Attempt to unlock a mutex which is not locked
ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]
whole segfault in the attached txt file.
Full crash backtrace shows the error originates from:
-
rb_threadptr_unlock_all_locking_mutexes
in thread.c:450 -
rb_thread_terminate_all
in thread.c:467
The crash suggests an issue in mutex state management during thread termination.
Expected Result¶
The script should complete successfully without crashing. The mutex should be properly managed across Thread and Fiber contexts, and garbage collection should not interfere with mutex state.
Files
Updated by masterleep2 (Bill Lipa) 5 months ago
Additionally, on macOS, the script crashes but then gets in a seemingly endless loop after printing the 'C level backtrace information' line, and can't be killed with ^C.
Updated by byroot (Jean Boussier) 5 months ago
Looks like we're missing a null check:
* frame #0: 0x0000000100349cb4 miniruby`thread_mutex_remove(thread=0x0000000000000000, mutex=0x0000600001147f00) at thread_sync.c:225:12
frame #1: 0x000000010033c1f0 miniruby`rb_mutex_unlock_th(mutex=0x0000600001147f00, th=0x0000000000000000, fiber=0x0000000122f0a9c0) at thread_sync.c:467:5
frame #2: 0x0000000100349a9c miniruby`mutex_free(ptr=0x0000600001147f00) at thread_sync.c:132:27
frame #3: 0x0000000100122bdc miniruby`rb_data_free(objspace=0x000000012380a200, obj=4349174720) at gc.c:1192:17
frame #4: 0x0000000100122638 miniruby`rb_gc_obj_free(objspace=0x000000012380a200, obj=4349174720) at gc.c:1371:14
I suspec tthe thread was freed before the mutex.
Updated by byroot (Jean Boussier) 5 months ago
Looks like it's not that simple. This smells of memory corruption because we end up in this loop:
-> 230 while (*keeping_mutexes && *keeping_mutexes != mutex) {
231 // Move to the next mutex in the list:
232 keeping_mutexes = &(*keeping_mutexes)->next_mutex;
233 }
And at some point ->next_mutex
is a clearly wrong pointer (various low values such as 0xff
, 0x13
, etc). So I assume something else end up overwriting that memory.
All I can say is it still reproduce on master
.
Updated by luke-gru (Luke Gruber) about 2 months ago
- Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
I created a PR here: https://github.com/ruby/ruby/pull/14262
Updated by luke-gru (Luke Gruber) about 2 months ago
- Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED
Updated by ioquatix (Samuel Williams) about 1 month ago
Just for the sake of cross-referencing related issues: https://bugs.ruby-lang.org/issues/18818
Updated by byroot (Jean Boussier) about 1 month ago
- Related to Bug #18818: Thread waitq does not retain referenced objects, can lead to use after free. added
Updated by Anonymous 10 days ago
- Status changed from Open to Closed
Applied in changeset git|62430c19c9f1ab49429cebe65f30588472648c95.
Properly unlock locked mutexes on thread cleanup.
Mutexes were being improperly unlocked on thread cleanup. This bug was
introduced in 050a8954395.
We must keep a reference from the mutex to the thread, because if the fiber
is collected before the mutex is, then we cannot unlink it from the thread in
mutex_free
. If it's not unlinked from the the thread when it's freed, it
causes bugs in rb_thread_unlock_all_locking_mutexes
.
We now mark the fiber when a mutex is locked, and the thread is marked
as well. However, a fiber can still be freed in the same GC cycle as the
mutex, so the reference to the thread is still needed.
The reason we need to mark the fiber is that mutex_owned_p()
has an ABA
issue where if the fiber is collected while it's locked, a new fiber could be
allocated at the same memory address and we could get false positives.
Fixes [Bug #21342]
Co-authored-by: John Hawthorn john@hawthorn.email
Updated by k0kubun (Takashi Kokubun) 6 days ago
- Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE
ruby_3_4 4306c9048fb674d24b92dc46b6746a4749564147 merged revision(s) 62430c19c9f1ab49429cebe65f30588472648c95.
Updated by k0kubun (Takashi Kokubun) 6 days ago
- Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED
Reverted the ruby_3_4 backport since it failed the CI.
Updated by k0kubun (Takashi Kokubun) 6 days ago
I guess it was due to a wrong conflict resolution. I'll attempt to backport it again.
Updated by k0kubun (Takashi Kokubun) 6 days ago
- Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE
ruby_3_4 264b2d78c366491834c9ce4499a4ad7b7564b89d.
Updated by nagachika (Tomoyuki Chikanaga) about 6 hours ago
- Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE to 3.2: UNKNOWN, 3.3: DONE, 3.4: DONE
ruby_3_3 commit:05f93fe6dc6f99fd2f728dd3c85dca944f1f4ba1 merged revision(s) 62430c19c9f1ab49429cebe65f30588472648c95.