Project

General

Profile

Actions

Bug #20998

closed

rb_str_locktmp() changes flags of frozen strings and string literals

Bug #20998: rb_str_locktmp() changes flags of frozen strings and string literals

Added by Eregon (Benoit Daloze) over 1 year ago. Updated 10 months ago.

Status:
Closed
Target version:
ruby -v:
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
[ruby-core:120465]

Description

# frozen_string_literal: true
# BOILERPLATE START

require 'tmpdir'
require 'rbconfig'

def inline_c_extension(c_code)
  Dir.mktmpdir('inline_c_extension') do |dir|
    File.write("#{dir}/cext.c", c_code)
    File.write("#{dir}/extconf.rb", <<~RUBY)
    require 'mkmf'
    create_makefile('cext')
    RUBY

    out = IO.popen([RbConfig.ruby, 'extconf.rb'], chdir: dir, &:read)
    raise "ruby extconf.rb failed: #{$?.inspect}\n#{out}" unless $?.success?

    out = IO.popen(['make'], chdir: dir, &:read)
    raise "make failed: #{$?.inspect}\n#{out}" unless $?.success?

    require "#{dir}/cext.#{RbConfig::CONFIG['DLEXT']}"
  end
end

inline_c_extension <<~C
#include "ruby.h"

static VALUE foo(VALUE self, VALUE str) {
  rb_str_locktmp(str);
  return str;
}

void Init_cext(void) {
  VALUE c = rb_define_class("Foo", rb_cObject);
  rb_define_singleton_method(c, "foo", foo, 1);
}
C

# BOILERPLATE END

a = "str"
Foo.foo(a)

# imagine a million lines of code in between

b = "str"

b << "."
# can't modify string; temporarily locked (RuntimeError)
# What? Who "locked" that immutable frozen string literal?
# It should be: can't modify frozen String: "str" (FrozenError)

Same problem with:

Foo.foo("abc")

# imagine a million lines of code in between

Foo.foo("abc") # temporal locking already locked string (RuntimeError)

Related: https://github.com/oracle/truffleruby/issues/3752

It seems a clear bug to mutate a frozen string (with visible side effects), even more so for shared frozen string literals.

I think rb_str_locktmp() should raise (a FrozenError, as it's effectively attempting to mutate it, same as calling rb_str_modify()) if called on a frozen string, because it makes little sense, I think "locking a string" only makes sense for mutable strings.

The alternative would be to noop on frozen strings for rb_str_locktmp() and rb_str_unlocktmp(), I think that's susprising, and potentially hiding more bugs, e.g. if one by mistake tries to mutate the RSTRING_PTR() or so, and also rb_str_locktmp() wouldn't imply the flag is set after it returns.
Any attempt to mutate a frozen string should fail, so might as well fail early.

Updated by Eregon (Benoit Daloze) over 1 year ago Actions #1

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 1 year ago Actions #2

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 1 year ago Actions #3

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 1 year ago Actions #4

  • Description updated (diff)

Updated by nobu (Nobuyoshi Nakada) about 1 year ago Actions #5 [ruby-core:120493]

There isn’t a case that locking a string just to avoid being modified or moved, but not to mutate it?

Updated by Eregon (Benoit Daloze) about 1 year ago Actions #6 [ruby-core:120495]

I'm not aware of any such case, and .freeze seems much better for that use case.

Updated by byroot (Jean Boussier) about 1 year ago Actions #7 [ruby-core:120497]

Actually yes, I think it's not an uncommon use case?

e.g. if you need to write a given string into a socket or something like that, you may want to lock it to ensure a concurrent thread won't modify it.

I think zlic.c uses it this way?

Updated by Eregon (Benoit Daloze) about 1 year ago · Edited Actions #8 [ruby-core:120499]

@byroot (Jean Boussier) but that's mutating the string, @nobu (Nobuyoshi Nakada) is asking for a case not mutating the string, which I think there is none.

EDIT: I read too fast, "write a given string into a socket" does not imply mutating the string.

Updated by Eregon (Benoit Daloze) about 1 year ago Actions #9 [ruby-core:120500]

https://github.com/ruby/ruby/blob/975461bf885b8fb791cf048ad39c79924d28e4e9/ext/zlib/zlib.c#L1056 but I'm not quite sure how that String is used.
It seems clearly wrong that two threads using Zlib with a frozen String literal (which happens to have the same contents/characters for both threads) would fail with temporal locking already locked string (RuntimeError) though.
So what should we do then when rb_str_locktmp() is called on a frozen string?

Updated by byroot (Jean Boussier) about 1 year ago Actions #10 [ruby-core:120520]

It seems clearly wrong that two threads using Zlib with a frozen String literal (which happens to have the same contents/characters for both threads) would fail with temporal locking already locked string (RuntimeError) though.

That I totally agree.

Updated by Eregon (Benoit Daloze) about 1 year ago Actions #11 [ruby-core:120583]

@nobu (Nobuyoshi Nakada) @byroot (Jean Boussier) So do you think rb_str_locktmp() should noop on frozen strings (since they can't change anyway), or it should raise because the usage seems somewhat dubious?
I'm fine with either, I'd just like a decision so we can do the same in TruffleRuby.
(the current status is broken so that's worse than either fix)

Updated by byroot (Jean Boussier) about 1 year ago Actions #12 [ruby-core:120585]

I prefer the noop solution, as it makes it a much easier to use API. But no strong opinion.

Updated by mame (Yusuke Endoh) about 1 year ago Actions #13 [ruby-core:120981]

Discussed at the dev meeting. @akr (Akira Tanaka) preferred raising an Exception. The current purpose of rb_str_locktmp is to prevent the RSTRING_PTR from being moved (by realloc, compaction, etc.) when passing the pointer to a C function that blocks and writes to the buffer (typically, read(2) syscall). In other words, rb_str_locktmp should be performed on a String that is about to be written. We couldn't find a use case for rb_str_locktmp on a frozen String. Unless there is a valid use case for that, it would be good to raise an Exception, instead of no-op.

Updated by Eregon (Benoit Daloze) 11 months ago Actions #14 [ruby-core:122142]

mame (Yusuke Endoh) wrote in #note-13:

Unless there is a valid use case for that, it would be good to raise an Exception, instead of no-op.

FYI TruffleRuby implemented FrozenError when rb_str_locktmp() is used on a frozen String in https://github.com/oracle/truffleruby/pull/3867.

It would be great if CRuby can do the same for 3.5.

Updated by Eregon (Benoit Daloze) 10 months ago Actions #15 [ruby-core:122532]

  • Assignee set to Eregon (Benoit Daloze)

Updated by ioquatix (Samuel Williams) 10 months ago Actions #16 [ruby-core:122535]

My usage of rb_str_locktmp isn't about mutation, it's about making sure the pointer doesn't change. Could compacting GC cause embedded frozen strings to be moved?

Updated by Eregon (Benoit Daloze) 10 months ago Actions #17

  • Status changed from Open to Closed

Applied in changeset git|83fb07fb2c97b9922450979fa4a56f43324317a9.


[Bug #20998] Check if the string is frozen in rb_str_locktmp() & rb_str_unlocktmp()

Updated by Eregon (Benoit Daloze) 10 months ago Actions #18 [ruby-core:122541]

ioquatix (Samuel Williams) wrote in #note-16:

My usage of rb_str_locktmp isn't about mutation, it's about making sure the pointer doesn't change. Could compacting GC cause embedded frozen strings to be moved?

rb_str_locktmp() doesn't prevent GC compaction to move the String, more details about that on the PR.

Updated by Eregon (Benoit Daloze) 10 months ago Actions #19

  • Target version set to 4.0

Updated by hanazuki (Kasumi Hanazuki) 10 months ago Actions #20 [ruby-core:122542]

FYI: IO::Buffer is now broken (by a recent change) because the embedded Strings are not pinned, and I'm proposing a fix at #21210

Updated by Eregon (Benoit Daloze) 10 months ago · Edited Actions #21 [ruby-core:122544]

@hanazuki (Kasumi Hanazuki) Thanks for the link, I wish I found that earlier. It confirms that it was already broken which is the same conclusion I came to.

hanazuki (Kasumi Hanazuki) wrote in #note-20:

FYI: IO::Buffer is now broken (by a recent change)

Which change do you mean? git|6012145299cfa4ab561360c78710c7f2941a7e9d ?

EDIT: reply from hanazuki somehow not showing here: Yes. the commit "Mark rb_io_buffer_type references declaratively"

Updated by Anonymous 10 months ago Actions #22


Actions

Also available in: PDF Atom