Project

General

Profile

Actions

Bug #15916

closed

Memory leak in Regexp literal interpolation

Added by mltsy (Joe Marty) almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
[ruby-core:93071]

Description

When interpolating a string inside a Regexp literal, if the string contains a multibyte character loaded from a file (not sure if this covers all the cases, but this is what triggers it for me), Ruby leaks memory.

The code below reproduces the problem, while outputting the process memory usage as it rises (get_process_mem gem is required).

Ways to avoid the memory leak (although I don't know why) include:

  1. Using the string literal to define PATTERN directly (Not loading it from a file)
  2. Using Regexp.new instead of a literal interpolation (/#{...}/)
  3. Shortening the string to just a few characters (maybe small enough to fit inside a single RVALUE?)
require 'get_process_mem'

str = "String that doesn't fit into a single RVALUE, with a multibyte char:" + 160.chr(Encoding::UTF_8)
File.write('weirdstring.txt', str)
pattern = File.read("weirdstring.txt")

loop do
  print "Running... "

  100_000.times { /#{pattern}/i }

  puts " process mem: #{GetProcessMem.new.mb.to_i}MB"
end

Expected Result:
Constant memory usage (avoiding the leak produces constant memory usage between 10-20MB)

Actual Result:
Continual memory growth (it only takes 60 seconds or so to consume 500MB)


Related issues

Related to Ruby master - Bug #16041: eval's path argument seems to trigger GC bugClosedko1 (Koichi Sasada)Actions
Related to Ruby master - Bug #16136: String corruption in 2.6.4ClosedActions
Actions #1

Updated by mltsy (Joe Marty) almost 2 years ago

  • Subject changed from Memory leak in Regular Expression interpolation to Memory leak in Regexp literal interpolation

Updated by mltsy (Joe Marty) almost 2 years ago

  • Description updated (diff)

Simplified the test case.

Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago

I can confirm this is still an issue on the master branch, and it appears to be leaking ~178 bytes/regexp. It occurs both with and without the case insensitive modifier. I'm think it must be a difference between rb_reg_initialize_m and rb_reg_preprocess_dregexp, but I haven't tracked it down yet.

Updated by luke-gru (Luke Gruber) almost 2 years ago

I managed to track down the leak, and it's related to rb_fstring().

reg_set_source() calls rb_fstring() with the tainted string, and there's a leak when non-"bare" (ivars or tainted) non-embedded strings are
given to rb_fstring().

It occurs in str_replace_shared_without_enc(), whos first argument should always be a new string,
not one with a buffer already, as this function doesn't clear any already allocated buffer.

Adding

        if (!STR_EMBED_P(str2) && !FL_TEST(str2, STR_SHARED|STR_NOFREE)) {
            ruby_sized_xfree(STR_HEAP_PTR(str2), STR_HEAP_SIZE(str2));
        }

to the else case in this function plugs the leak. I don't think this function should be called at all in rb_fstring(),
but that could be solved in a different issue/ticket if necessary.

Actions #5

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|53e9908d8afc7f03109b0aafd1698ab35f512b05.


Fix memory leak

  • string.c (str_replace_shared_without_enc): free previous buffer
    before replaced.

  • parse.y (gettable): make sure in advance that the __FILE__
    object shares a fstring, to get rid of replacement with the
    fstring later.
    TODO: this hack may be needed in other places.

[Bug #15916]

Co-Authored-By: luke-gru (Luke Gruber) luke.gru@gmail.com

Actions #6

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

  • Related to Bug #16041: eval's path argument seems to trigger GC bug added
Actions #7

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN to 2.4: UNKNOWN, 2.5: REQUIRED, 2.6: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago

  • Backport changed from 2.4: UNKNOWN, 2.5: REQUIRED, 2.6: REQUIRED to 2.4: UNKNOWN, 2.5: REQUIRED, 2.6: DONE

ruby_2_6 r67732 merged revision(s) 53e9908d8afc7f03109b0aafd1698ab35f512b05.

Actions #9

Updated by naruse (Yui NARUSE) over 1 year ago

  • Related to Bug #16136: String corruption in 2.6.4 added
Actions

Also available in: Atom PDF