Project

General

Profile

Actions

Bug #19580

closed

Ensure ruby_xfree won't segfault if called after ruby_vm_destruct

Added by mdalessio (Mike Dalessio) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:113129]

Description

Github PR: https://github.com/ruby/ruby/pull/7663

The POSIX Threads API provides the ability to define a destructor to clean up thread-local storage at thread exit by using pthread_key_create to bind a storage key and an associated destructor function, and pthread_setspecific to associate a value with that key. If a C extension is using ruby_xmalloc and ruby_xfree to manage memory, then the destructor function will likely call ruby_xfree.

There is a small window of time as a process is exiting -- after ruby_vm_destruct but before the process exits -- in which a thread may exit and the destructor function called, leading to a segfault.

Real-world use case

The real-world scenario motivating this change is libxml2's pthread code which uses pthread_key_create to set up a destructor that is called at thread exit to clean up thread-local storage associated with the thread by pthread_setspecific.

Nokogiri has, since 2009, configured libxml2 to use ruby_xmalloc and ruby_xfree so that Ruby's memory management is aware of the allocated memory. As a result, ruby_xfree is being called by the pthread destructor, which may lead to a segfault that looks like:

[BUG] Segmentation fault at 0x0000000000000420
ruby 3.3.0dev (2023-03-19T06:13:25Z master ca9355e173) [x86_64-linux]

-- Machine register context ------------------------------------------------
 RIP: 0x00007f9b6311124e RBP: 0x00007f9b58001530 RSP: 0x00007f9b5d8eed60
 RAX: 0x0000000000000000 RBX: 0x00007f9b58001888 RCX: 0x00000000000003ff
 RDX: 0x00007f9b5dad62b0 RDI: 0x00007f9b58007090 RSI: 0x0000000000000000
  R8: 0x00007f9b5d8eed64  R9: 0x00000000000000ca R10: 0x0000000000000000
 R11: 0x0000000000000246 R12: 0x00007f9b58007090 R13: 0x00007f9b62e1bae8
 R14: 0x0000000000000004 R15: 0x00007f9b5d8efb58 EFL: 0x0000000000010206

-- C level backtrace information -------------------------------------------
.../3.3.0-dev/lib/libruby.so.3.3(rb_print_backtrace+0xd) [0x7f9b632e6f1f] /home/flavorjones/code/oss/ruby/vm_dump.c:785
.../3.3.0-dev/lib/libruby.so.3.3(rb_vm_bugreport) /home/flavorjones/code/oss/ruby/vm_dump.c:1101
.../3.3.0-dev/lib/libruby.so.3.3(rb_bug_for_fatal_signal+0xf4) [0x7f9b630eead4] /home/flavorjones/code/oss/ruby/error.c:813
.../3.3.0-dev/lib/libruby.so.3.3(sigsegv+0x4d) [0x7f9b63233bbd] /home/flavorjones/code/oss/ruby/signal.c:964
/lib/x86_64-linux-gnu/libc.so.6(0x7f9b62c42520) [0x7f9b62c42520]
.../3.3.0-dev/lib/libruby.so.3.3(ruby_sized_xfree+0x3) [0x7f9b6311124e] /home/flavorjones/code/oss/ruby/gc.c:12666
.../3.3.0-dev/lib/libruby.so.3.3(ruby_sized_xfree) /home/flavorjones/code/oss/ruby/gc.c:12663
.../lib/nokogiri/nokogiri.so(xmlResetError+0x0) [0x7f9b5da6ab4a] .../tmp/x86_64-linux/nokogiri/3.0.5/tmp/x86_64-pc-linux-gnu/ports/libxml2/2.10.3/libxml2-2.10.3/error.c:880
.../lib/nokogiri/nokogiri.so(xmlResetError) .../tmp/x86_64-linux/nokogiri/3.0.5/tmp/x86_64-pc-linux-gnu/ports/libxml2/2.10.3/libxml2-2.10.3/error.c:873
.../lib/nokogiri/nokogiri.so(xmlResetError) (null):0
.../lib/nokogiri/nokogiri.so(xmlFreeGlobalState+0x14) [0x7f9b5dad62c4] .../tmp/x86_64-linux/nokogiri/3.0.5/tmp/x86_64-pc-linux-gnu/ports/libxml2/2.10.3/libxml2-2.10.3/threads.c:548
/lib/x86_64-linux-gnu/libc.so.6(__nptl_deallocate_tsd+0x77) [0x7f9b62c91711] ./nptl/nptl_deallocate_tsd.c:73
/lib/x86_64-linux-gnu/libc.so.6(__nptl_deallocate_tsd) ./nptl/nptl_deallocate_tsd.c:22
/lib/x86_64-linux-gnu/libc.so.6(start_thread+0x17a) [0x7f9b62c949ca] ./nptl/pthread_create.c:453
[0x7f9b62d26a00]

I've reproduced this in Ruby 3.0, 3.1, 3.2, and master by using the following ruby code:

# must have an error in it to cause pthread_setspecific to be called
html = "<div foo='asdf>asdf</div>"

Thread.new { Nokogiri::HTML4::Document.parse(html) }
sleep 3 # THREAD_CACHE_TIME

exit 0

The timing is admittedly hard to reproduce, but Gitlab has seen this in their CI pipelines a few dozen times and it's made easier if the process lifetime is extended by an atexit-registered function.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like1Like0Like0