Project

General

Profile

Bug #15792

GC can leave strings used as hash keys in a corrupted state

Added by byroot (Jean Boussier) 7 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92412]

Description

The following script showcase the issue:

#!/usr/bin/env ruby --disable-gems
a = ('a' * 24).encode(Encoding::ASCII).gsub('x', '')
b = ('b' * 24).encode(Encoding::ASCII).gsub('x', '')

hash = {}
hash[a] = true
hash[b] = true

puts "Bebore garbage_collection: a=#{a.inspect} b=#{b.inspect}"
4.times { GC.start }
puts "After garbage_collection: a=#{a.inspect} b=#{b.inspect}"

Expected output:

Bebore garbage_collection: a="aaaaaaaaaaaaaaaaaaaaaaaa" b="bbbbbbbbbbbbbbbbbbbbbbbb"
After garbage_collection: a="aaaaaaaaaaaaaaaaaaaaaaaa" b="bbbbbbbbbbbbbbbbbbbbbbbb"

Actual output:

Ruby: 2.6.2
Bebore garbage_collection: a="aaaaaaaaaaaaaaaaaaaaaaaa" b="bbbbbbbbbbbbbbbbbbbbbbbb"
After garbage_collection: a="}\x0Eu\xDB\xFC\a\x00\x80\xE9\ru\xDB\xFC\a\x00\x10\x04\x00aaaaaa" b="\x00\x00\x00\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x00\xC0\x02\x00bbbbbb"

We reduced the repro script as much as we could, both the .encode(ASCII) and the gsub are necessary for the bug to manifest itself.

We also used ObjectSpace.dump() to analyze the corrupted string.

b = "shared":true, "encoding":"US-ASCII", "references":["0x7faf4a01aeb8"]
0x7faf4a01aeb8 = "frozen":true, "fstring":true, "bytesize":24, "value":"bbbbbbbbbbbbbbbbbbbbbbbb", "encoding":"US-ASCII"

Big thanks to Édouard Chin who did most of the initial repro reduction.

Associated revisions

Revision 3f956201
Added by nobu (Nobuyoshi Nakada) 7 months ago

Get rid of indirect sharing

  • string.c (str_duplicate): share the root shared string if the original string is already sharing, so that all shared strings refer the root shared string directly. indirect sharing can cause a dangling pointer.

[Bug #15792]

Revision c06ddfee
Added by alanwu (Alan Wu) 6 months ago

str_duplicate: Don't share with a frozen shared string

This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to str_duplicate.

Such string looks like:

 --------                    -----------------
 | root | ------ owns -----> | root's buffer |
 --------                    -----------------
     ^                             ^   ^
 -----------                       |   |
 | shared1 | ------ references -----   |
 -----------                           |
     ^                                 |
 -----------                           |
 | shared2 | ------ references ---------
 -----------

This is bad news because rb_fstring(shared2) can make shared1
independent, which severs the reference from shared1 to root:

/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
    str_make_independent(str);  /* no frozen check */
}

If shared1 was the only reference to root, then root can be
reclaimed by the GC, leaving shared2 in a corrupted state:

 -----------                         --------------------
 | shared1 | -------- owns --------> | shared1's buffer |
 -----------                         --------------------
      ^
      |
 -----------                         -------------------------
 | shared2 | ------ references ----> | root's buffer (freed) |
 -----------                         -------------------------

Here is a reproduction script for the situation this commit fixes.

a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
  • string.c (str_duplicate): always share with the root string when the original is a shared string.
  • test_rb_str_dup.rb: specifically test rb_str_dup to make sure it does not try to share with a shared string.

[Bug #15792]

Closes: https://github.com/ruby/ruby/pull/2159

Revision f5930c87
Added by nagachika (Tomoyuki Chikanaga) 3 months ago

merge revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e57dfe03ce3913009d69a33d6f6100be: [Backport #15792]

    Get rid of indirect sharing

    * string.c (str_duplicate): share the root shared string if the
      original string is already sharing, so that all shared strings
      refer the root shared string directly.  indirect sharing can
      cause a dangling pointer.

    [Bug #15792]

    str_duplicate: Don't share with a frozen shared string

    This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a.
    Before this commit, it was possible to create a shared string which
    shares with another shared string by passing a frozen shared string
    to `str_duplicate`.

    Such string looks like:

    ```
     --------                    -----------------
     | root | ------ owns -----> | root's buffer |
     --------                    -----------------
         ^                             ^   ^
     -----------                       |   |
     | shared1 | ------ references -----   |
     -----------                           |
         ^                                 |
     -----------                           |
     | shared2 | ------ references ---------
     -----------
    ```

    This is bad news because `rb_fstring(shared2)` can make `shared1`
    independent, which severs the reference from `shared1` to `root`:

    ```c
    /* from fstr_update_callback() */
    str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
    if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
        str_make_independent(str);  /* no frozen check */
    }
    ```

    If `shared1` was the only reference to `root`, then `root` can be
    reclaimed by the GC, leaving `shared2` in a corrupted state:

    ```
     -----------                         --------------------
     | shared1 | -------- owns --------> | shared1's buffer |
     -----------                         --------------------
          ^
          |
     -----------                         -------------------------
     | shared2 | ------ references ----> | root's buffer (freed) |
     -----------                         -------------------------
    ```

    Here is a reproduction script for the situation this commit fixes.

    ```ruby
    a = ('a' * 24).strip.freeze.strip
    -a
    p a
    4.times { GC.start }
    p a
    ```

     - string.c (str_duplicate): always share with the root string when
       the original is a shared string.
     - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
       sure it does not try to share with a shared string.

    [Bug #15792]

    Closes: https://github.com/ruby/ruby/pull/2159

    Update dependencies

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_6@67731 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 67731
Added by nagachika (Tomoyuki Chikanaga) 3 months ago

merge revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e57dfe03ce3913009d69a33d6f6100be: [Backport #15792]

Get rid of indirect sharing

* string.c (str_duplicate): share the root shared string if the
  original string is already sharing, so that all shared strings
  refer the root shared string directly.  indirect sharing can
  cause a dangling pointer.

[Bug #15792]

str_duplicate: Don't share with a frozen shared string

This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.

Such string looks like:

```
 --------                    -----------------
 | root | ------ owns -----> | root's buffer |
 --------                    -----------------
     ^                             ^   ^
 -----------                       |   |
 | shared1 | ------ references -----   |
 -----------                           |
     ^                                 |
 -----------                           |
 | shared2 | ------ references ---------
 -----------
```

This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:

```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
    str_make_independent(str);  /* no frozen check */
}
```

If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:

```
 -----------                         --------------------
 | shared1 | -------- owns --------> | shared1's buffer |
 -----------                         --------------------
      ^
      |
 -----------                         -------------------------
 | shared2 | ------ references ----> | root's buffer (freed) |
 -----------                         -------------------------
```

Here is a reproduction script for the situation this commit fixes.

```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```

 - string.c (str_duplicate): always share with the root string when
   the original is a shared string.
 - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
   sure it does not try to share with a shared string.

[Bug #15792]

Closes: https://github.com/ruby/ruby/pull/2159

Update dependencies

Revision 689a6a0a
Added by usa (Usaku NAKAMURA) 3 months ago

merge revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e5: [Backport #15792]

    Get rid of indirect sharing

    * string.c (str_duplicate): share the root shared string if the
      original string is already sharing, so that all shared strings
      refer the root shared string directly.  indirect sharing can
      cause a dangling pointer.

    [Bug #15792]

    str_duplicate: Don't share with a frozen shared string

    This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a.
    Before this commit, it was possible to create a shared string which
    shares with another shared string by passing a frozen shared string
    to `str_duplicate`.

    Such string looks like:

    ```
     --------                    -----------------
     | root | ------ owns -----> | root's buffer |
     --------                    -----------------
         ^                             ^   ^
     -----------                       |   |
     | shared1 | ------ references -----   |
     -----------                           |
         ^                                 |
     -----------                           |
     | shared2 | ------ references ---------
     -----------
    ```

    This is bad news because `rb_fstring(shared2)` can make `shared1`
    independent, which severs the reference from `shared1` to `root`:

    ```c
    /* from fstr_update_callback() */
    str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
    if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
        str_make_independent(str);  /* no frozen check */
    }
    ```

    If `shared1` was the only reference to `root`, then `root` can be
    reclaimed by the GC, leaving `shared2` in a corrupted state:

    ```
     -----------                         --------------------
     | shared1 | -------- owns --------> | shared1's buffer |
     -----------                         --------------------
          ^
          |
     -----------                         -------------------------
     | shared2 | ------ references ----> | root's buffer (freed) |
     -----------                         -------------------------
    ```

    Here is a reproduction script for the situation this commit fixes.

    ```ruby
    a = ('a' * 24).strip.freeze.strip
    -a
    p a
    4.times { GC.start }
    p a
    ```

     - string.c (str_duplicate): always share with the root string when
       the original is a shared string.
     - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
       sure it does not try to share with a shared string.

    [Bug #15792]

    Closes: https://github.com/ruby/ruby/pull/2159

    Update dependencies

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@67766 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 67766
Added by usa (Usaku NAKAMURA) 3 months ago

merge revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e5: [Backport #15792]

Get rid of indirect sharing

* string.c (str_duplicate): share the root shared string if the
  original string is already sharing, so that all shared strings
  refer the root shared string directly.  indirect sharing can
  cause a dangling pointer.

[Bug #15792]

str_duplicate: Don't share with a frozen shared string

This is a follow up for 3f9562015e651735bfc2fdd14e8f6963b673e22a.
Before this commit, it was possible to create a shared string which
shares with another shared string by passing a frozen shared string
to `str_duplicate`.

Such string looks like:

```
 --------                    -----------------
 | root | ------ owns -----> | root's buffer |
 --------                    -----------------
     ^                             ^   ^
 -----------                       |   |
 | shared1 | ------ references -----   |
 -----------                           |
     ^                                 |
 -----------                           |
 | shared2 | ------ references ---------
 -----------
```

This is bad news because `rb_fstring(shared2)` can make `shared1`
independent, which severs the reference from `shared1` to `root`:

```c
/* from fstr_update_callback() */
str = str_new_frozen(rb_cString, shared2);  /* can return shared1 */
if (STR_SHARED_P(str)) { /* shared1 is also a shared string */
    str_make_independent(str);  /* no frozen check */
}
```

If `shared1` was the only reference to `root`, then `root` can be
reclaimed by the GC, leaving `shared2` in a corrupted state:

```
 -----------                         --------------------
 | shared1 | -------- owns --------> | shared1's buffer |
 -----------                         --------------------
      ^
      |
 -----------                         -------------------------
 | shared2 | ------ references ----> | root's buffer (freed) |
 -----------                         -------------------------
```

Here is a reproduction script for the situation this commit fixes.

```ruby
a = ('a' * 24).strip.freeze.strip
-a
p a
4.times { GC.start }
p a
```

 - string.c (str_duplicate): always share with the root string when
   the original is a shared string.
 - test_rb_str_dup.rb: specifically test `rb_str_dup` to make
   sure it does not try to share with a shared string.

[Bug #15792]

Closes: https://github.com/ruby/ruby/pull/2159

Update dependencies

History

Updated by byroot (Jean Boussier) 7 months ago

Actually, even simpler repro script:

#!/usr/bin/env ruby --disable-gems
a = ('a' * 24).encode(Encoding::ASCII).strip

hash = {}
hash[a] = true

puts "Before garbage_collection: a=#{a.inspect}"
4.times { GC.start }
puts "After garbage_collection: a=#{a.inspect}"

Updated by byroot (Jean Boussier) 7 months ago

Édouad Chin bisected the script against the MRI repository, and found https://bugs.ruby-lang.org/issues/15251 the first commit to trigger the problem.

That being said it's unlikely to really be the source of the bug. It's much more likely to be in the fstring function.

Updated by wanabe (_ wanabe) 7 months ago

I guess it is caused by the combination of str_duplicate and rb_fstring from r52074.

$ cat bug15792.rb 
a = ('a' * 24).b.strip

->{ eval "", binding, a, 1 }.call

puts "Before garbage_collection: a=#{a.inspect}"
4.times { GC.start }
puts "After garbage_collection: a=#{a.inspect}"

$ ./miniruby -v bug15792.rb 
ruby 2.3.0dev (2015-10-07 trunk 52074) [x86_64-linux]
Before garbage_collection: a="aaaaaaaaaaaaaaaaaaaaaaaa"
After garbage_collection: a="\x00B\xFB\xCCmU\x00\x00\x10\x10\xE7\xCCmU\x00\x00aaaaaaaa"
#5

Updated by nagachika (Tomoyuki Chikanaga) 7 months ago

  • Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN to 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: REQUIRED
#6

Updated by nobu (Nobuyoshi Nakada) 5 months ago

  • Backport changed from 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: REQUIRED to 2.4: REQUIRED, 2.5: REQUIRED, 2.6: REQUIRED

Updated by nagachika (Tomoyuki Chikanaga) 3 months ago

  • Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: REQUIRED to 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE

ruby_2_6 r67731 merged revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e57dfe03ce3913009d69a33d6f6100be.

Updated by usa (Usaku NAKAMURA) 3 months ago

  • Backport changed from 2.4: REQUIRED, 2.5: REQUIRED, 2.6: DONE to 2.4: REQUIRED, 2.5: DONE, 2.6: DONE

ruby_2_5 r67766 merged revision(s) 3f9562015e651735bfc2fdd14e8f6963b673e22a,c06ddfee878524168e4af07443217ed2f8d0954b,3b3b4a44e5.

Also available in: Atom PDF