Project

General

Profile

Actions

Bug #21220

closed

Memory corruption in update_line_coverage() [write at index -1]

Added by mbcodeandsound (Mike Bourgeous) 8 days ago. Updated 1 day ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
[ruby-core:121556]

Description

Hello!

I have encountered repeatable memory corruption in Ruby 3.4.2 on Ubuntu 24.04.2 LTS, which I believe is happening in update_line_coverage(). I could not reproduce this on Ruby 3.x or earlier. My findings follow. I also have detailed step-by-step notes at https://github.com/mike-bourgeous/mb-sound/issues/36

Summary

update_line_coverage() calls rb_sourceline(), subtracts one from its return value, and uses this as an index into an Array. Sometimes rb_sourceline() returns 0, and when this happens, update_line_coverage() will write to index -1 of the array. This corrupts the heap before the Array, resulting in a program crash later during GC.

As I am new to the Ruby codebase I do not know if it's normal for rb_sourceline() to return 0 and update_line_coverage() should handle it, or if something is wrong in the code that ultimately feeds rb_sourceline().

Symptom

On Linux, affected processes print one of the following errors and exit:

munmap_chunk(): invalid pointer
Aborted (core dumped)

or, if preloading libc_malloc_debug.so

malloc_check_get_size: memory corruption
Aborted (core dumped)

Reproduction

I have a reduced GitHub project that can reproduce the bug consistently both on my machine and in CI. When I try to reduce the size of this repo further, the bug stops happening.

The issue only reproduces locally if the coverage/ directory has a large .resultset.json.

# Repeatedly running the process increases the likelihood of crashing
# as the SimpleCov result file grows.
for f in `seq 1 100`; do echo $f; ruby -r./spec/simplecov_helper.rb bin/midi_roll.rb -c 40 -r 2 spec/test_data/all_notes.mid > /dev/null || break ; done

Research and reasoning

I initially found the crash during a live stream when I was upgrading a project from Ruby 2.7 to Ruby 3.4. The crash occurred when an RSpec test tried to spawn another Ruby process, while using SimpleCov to measure code coverage in both. I discovered a workaround of disabling SimpleCov in the nested process when running tests on Ruby 3.4. I used a somewhat unusual approach to get coverage metrics for subprocesses.

After the stream I wanted to understand what was really happening and see if I could find a way to re-enable test code coverage for subprocesses. I used a combination of Valgrind, GDB, and trial and error to narrow down the site of the crash and the original corruption. I wrote a GDB script to automate information gathering when the GC crash occurred, and Valgrind+vgdb to identify the original write that appeared to cause the corruption.

I reviewed the Git history of update_line_coverage(), rb_sourceline() (and the functions it calls), and a few other functions, but did not find any obvious changes between Ruby 3.3.x and Ruby 3.4.x, so the root cause is somewhere beyond my familiarity with the codebase.

Full details of my process are in my issue notes: https://github.com/mike-bourgeous/mb-sound/issues/36


Files

corruption_c_stack.txt (2.63 KB) corruption_c_stack.txt mbcodeandsound (Mike Bourgeous), 04/07/2025 05:26 PM
corruption_ruby_stack.txt (948 Bytes) corruption_ruby_stack.txt mbcodeandsound (Mike Bourgeous), 04/07/2025 05:26 PM
crash_ruby_stack.txt (4.46 KB) crash_ruby_stack.txt mbcodeandsound (Mike Bourgeous), 04/07/2025 05:26 PM
crash_c_stack.txt (26.2 KB) crash_c_stack.txt mbcodeandsound (Mike Bourgeous), 04/07/2025 05:26 PM

Related issues 1 (1 open0 closed)

Related to Ruby - Bug #21259: The Prism compiler wrongly creates a line number of zeroAssignedprismActions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0Like0Like0Like0Like0Like0Like0