Project

General

Profile

Actions

Bug #21342

closed

Segfault: invalid keeping_mutexes when using Mutex in Thread then Fiber after GC

Added by maciej.mensfeld (Maciej Mensfeld) 5 months ago. Updated about 6 hours ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]
[ruby-core:122121]

Description

Ruby crashes with a [BUG] invalid keeping_mutexes error when attempting to GC locked mutex that was used in a Thread within a Fiber context after garbage collection. The error indicates an attempt to unlock a mutex that is not locked, suggesting a state management issue with mutexes across Thread and Fiber boundaries.

Ruby Version

ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]

Reproduce Process

# segv.rb

5.times do
  m = Mutex.new
  Thread.new do
    m.synchronize do
    end
  end.join
  Fiber.new do
    GC.start
    m.lock
  end.resume
end
  1. Save the above code to a file (e.g., segv.rb)
  2. Run with ruby segv.rb
  3. The crash occurs intermittently - sometimes it crashes immediately, sometimes it hangs, once in a while it works

Actual Result

The program crashes with the following error:

segv.rb: [BUG] invalid keeping_mutexes: Attempt to unlock a mutex which is not locked
ruby 3.4.3 (2025-04-14 revision d0b7e5b6a0) +PRISM [x86_64-linux]

whole segfault in the attached txt file.

Full crash backtrace shows the error originates from:

  • rb_threadptr_unlock_all_locking_mutexes in thread.c:450
  • rb_thread_terminate_all in thread.c:467

The crash suggests an issue in mutex state management during thread termination.

Expected Result

The script should complete successfully without crashing. The mutex should be properly managed across Thread and Fiber contexts, and garbage collection should not interfere with mutex state.


Files

crash.txt (23.4 KB) crash.txt maciej.mensfeld (Maciej Mensfeld), 05/15/2025 04:43 PM

Related issues 1 (1 open0 closed)

Related to Ruby - Bug #18818: Thread waitq does not retain referenced objects, can lead to use after free.Assignedioquatix (Samuel Williams)Actions

Updated by masterleep2 (Bill Lipa) 5 months ago

Additionally, on macOS, the script crashes but then gets in a seemingly endless loop after printing the 'C level backtrace information' line, and can't be killed with ^C.

Updated by byroot (Jean Boussier) 5 months ago

Looks like we're missing a null check:

  * frame #0: 0x0000000100349cb4 miniruby`thread_mutex_remove(thread=0x0000000000000000, mutex=0x0000600001147f00) at thread_sync.c:225:12
    frame #1: 0x000000010033c1f0 miniruby`rb_mutex_unlock_th(mutex=0x0000600001147f00, th=0x0000000000000000, fiber=0x0000000122f0a9c0) at thread_sync.c:467:5
    frame #2: 0x0000000100349a9c miniruby`mutex_free(ptr=0x0000600001147f00) at thread_sync.c:132:27
    frame #3: 0x0000000100122bdc miniruby`rb_data_free(objspace=0x000000012380a200, obj=4349174720) at gc.c:1192:17
    frame #4: 0x0000000100122638 miniruby`rb_gc_obj_free(objspace=0x000000012380a200, obj=4349174720) at gc.c:1371:14

I suspec tthe thread was freed before the mutex.

Updated by byroot (Jean Boussier) 5 months ago

Looks like it's not that simple. This smells of memory corruption because we end up in this loop:

-> 230 	    while (*keeping_mutexes && *keeping_mutexes != mutex) {
   231 	        // Move to the next mutex in the list:
   232 	        keeping_mutexes = &(*keeping_mutexes)->next_mutex;
   233 	    }

And at some point ->next_mutex is a clearly wrong pointer (various low values such as 0xff, 0x13, etc). So I assume something else end up overwriting that memory.

All I can say is it still reproduce on master.

Updated by luke-gru (Luke Gruber) about 2 months ago

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN to 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED
Actions #5

Updated by luke-gru (Luke Gruber) about 2 months ago

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED

Updated by ioquatix (Samuel Williams) about 1 month ago

Just for the sake of cross-referencing related issues: https://bugs.ruby-lang.org/issues/18818

Actions #7

Updated by byroot (Jean Boussier) about 1 month ago

  • Related to Bug #18818: Thread waitq does not retain referenced objects, can lead to use after free. added
Actions #8

Updated by Anonymous 10 days ago

  • Status changed from Open to Closed

Applied in changeset git|62430c19c9f1ab49429cebe65f30588472648c95.


Properly unlock locked mutexes on thread cleanup.

Mutexes were being improperly unlocked on thread cleanup. This bug was
introduced in 050a8954395.

We must keep a reference from the mutex to the thread, because if the fiber
is collected before the mutex is, then we cannot unlink it from the thread in
mutex_free. If it's not unlinked from the the thread when it's freed, it
causes bugs in rb_thread_unlock_all_locking_mutexes.

We now mark the fiber when a mutex is locked, and the thread is marked
as well. However, a fiber can still be freed in the same GC cycle as the
mutex, so the reference to the thread is still needed.

The reason we need to mark the fiber is that mutex_owned_p() has an ABA
issue where if the fiber is collected while it's locked, a new fiber could be
allocated at the same memory address and we could get false positives.

Fixes [Bug #21342]

Co-authored-by: John Hawthorn

Updated by k0kubun (Takashi Kokubun) 6 days ago

  • Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE

Updated by k0kubun (Takashi Kokubun) 6 days ago

  • Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED

Reverted the ruby_3_4 backport since it failed the CI.

Updated by k0kubun (Takashi Kokubun) 6 days ago

I guess it was due to a wrong conflict resolution. I'll attempt to backport it again.

Updated by k0kubun (Takashi Kokubun) 6 days ago

  • Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: REQUIRED to 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE

Updated by nagachika (Tomoyuki Chikanaga) about 6 hours ago

  • Backport changed from 3.2: UNKNOWN, 3.3: REQUIRED, 3.4: DONE to 3.2: UNKNOWN, 3.3: DONE, 3.4: DONE

ruby_3_3 commit:05f93fe6dc6f99fd2f728dd3c85dca944f1f4ba1 merged revision(s) 62430c19c9f1ab49429cebe65f30588472648c95.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0