Project

General

Profile

Actions

Bug #13737

closed

"can't modify frozen String" when installing bundled gems

Added by usa (Usaku NAKAMURA) almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Target version:
[ruby-core:81998]

Description

After committed r59304, mswin CI often (but not always) reports errors when installing bundled gems.
For example, http://mswinci.japaneast.cloudapp.azure.com/vc12-x64/ruby-trunk/log/20170710T020413Z.fail.html.gz

I've not investigated it yet, but something wrong at the commit, I believe.

Updated by ko1 (Koichi Sasada) almost 7 years ago

Not sure why but r59310 will fix it (at least on my environment). To complete it i think we need to dig into rubygems deeply....

Updated by normalperson (Eric Wong) almost 7 years ago

wrote:

Not sure why but r59310 will fix it (at least on my
environment). To complete it i think we need to dig into
rubygems deeply....

Thank you. I think there may also be an old bug at early
initializtion when rb_cString == 0 causing unfrozen string keys
form hash_aset instead of hash_aset_str

Fwiw, following change should not segfault (but it does):

--- a/hash.c
+++ b/hash.c
@@ -1519,7 +1519,7 @@ hash_aset(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
 static int
 hash_aset_str(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
 {
-    if (!existing && !RB_OBJ_FROZEN(*key)) {
+    if (!existing && !RB_OBJ_FROZEN_RAW(*key)) {
 	st_data_t fstr;
 	st_table *tbl = rb_vm_fstring_table();
 

Anyways, I will investigate more deeply tomorrow. I was about to
go to bed before noticing this ticket.

Updated by duerst (Martin Dürst) almost 7 years ago

ko1 (Koichi Sasada) wrote:

Not sure why but r59310 will fix it (at least on my environment). To complete it i think we need to dig into rubygems deeply....

I got virtually the same error message on cygwin (with context just in case it helps):

 xmlrpc-0.2.1.gem
Traceback (most recent call last):
        7: from ./tool/rbinstall.rb:844:in `<main>'
        6: from ./tool/rbinstall.rb:844:in `each'
        5: from ./tool/rbinstall.rb:847:in `block in <main>'
        4: from ./tool/rbinstall.rb:805:in `block in <main>'
        3: from ./tool/rbinstall.rb:805:in `each'
        2: from ./tool/rbinstall.rb:807:in `block (2 levels) in <main>'
        1: from /cygdrive/c/Data/ruby/lib/rubygems/installer.rb:261:in `spec'
/cygdrive/c/Data/ruby/lib/rubygems/installer.rb:264:in `rescue in spec': invalid gem: package is corrupt, exception while verifying: can't modify frozen String (RuntimeError) in /cygdrive/c/Data/ruby/gems/xmlrpc-0.3.0.gem (Gem::InstallError)
make: *** [uncommon.mk:310: do-install-nodoc] Error 1

Same as for Koichi, the exception is gone now at ruby 2.5.0dev (2017-07-11 trunk 59311) [x86_64-cygwin].

Updated by normalperson (Eric Wong) almost 7 years ago

wrote:

Not sure why but r59310 will fix it (at least on my
environment). To complete it i think we need to dig into
rubygems deeply....

It seems Psych is correct to be tainting keys with YAML.load.
Perhaps there needs to be a separate table for frozen tainted
strings?

Updated by normalperson (Eric Wong) almost 7 years ago

Eric Wong wrote:

wrote:

Not sure why but r59310 will fix it (at least on my
environment). To complete it i think we need to dig into
rubygems deeply....

It seems Psych is correct to be tainting keys with YAML.load.
Perhaps there needs to be a separate table for frozen tainted
strings?

Maybe the following works (barely tested, but I have to go...)

---
 hash.c                         | 23 +----------
 internal.h                     |  3 +-
 string.c                       | 94 ++++++++++++++++++++++++++++++++++++++++++
 test/ruby/test_optimization.rb | 20 +++++++++
 vm.c                           | 12 ++++++
 vm_core.h                      |  1 +
 6 files changed, 130 insertions(+), 23 deletions(-)

diff --git a/hash.c b/hash.c
index 7a0beb7225..e17d906e9b 100644
--- a/hash.c
+++ b/hash.c
@@ -18,7 +18,6 @@
 #include "probes.h"
 #include "id.h"
 #include "symbol.h"
-#include "gc.h"
 
 #ifdef __APPLE__
 # ifdef HAVE_CRT_EXTERNS_H
@@ -1516,33 +1515,13 @@ hash_aset(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
     return ST_CONTINUE;
 }
 
-static VALUE
-fstring_existing_str(VALUE str)
-{
-    st_data_t fstr;
-    st_table *tbl = rb_vm_fstring_table();
-
-    if (st_lookup(tbl, str, &fstr)) {
-	if (rb_objspace_garbage_object_p(fstr)) {
-	    return rb_fstring(str);
-	}
-	else {
-	    return (VALUE)fstr;
-	}
-    }
-    else {
-	return Qnil;
-    }
-}
-
 static int
 hash_aset_str(st_data_t *key, st_data_t *val, struct update_arg *arg, int existing)
 {
     if (!existing && !RB_OBJ_FROZEN(*key)) {
 	VALUE k;
 
-	if (!RB_OBJ_TAINTED(*key) &&
-	    (k = fstring_existing_str(*key)) != Qnil) {
+	if ((k = rb_fstring_existing(*key)) != Qnil) {
 	    *key = k;
 	}
 	else {
diff --git a/internal.h b/internal.h
index 81f4d2b91d..e6d50ebcda 100644
--- a/internal.h
+++ b/internal.h
@@ -1574,6 +1574,7 @@ VALUE rb_strftime(const char *format, size_t format_len, rb_encoding *enc,
 /* string.c */
 VALUE rb_fstring(VALUE);
 VALUE rb_fstring_new(const char *ptr, long len);
+VALUE rb_fstring_existing(VALUE);
 #define rb_fstring_lit(str) rb_fstring_new((str), rb_strlen_lit(str))
 #define rb_fstring_literal(str) rb_fstring_lit(str)
 VALUE rb_fstring_cstr(const char *str);
@@ -1726,7 +1727,7 @@ void rb_vm_check_redefinition_by_prepend(VALUE klass);
 VALUE rb_yield_refine_block(VALUE refinement, VALUE refinements);
 VALUE ruby_vm_special_exception_copy(VALUE);
 PUREFUNC(st_table *rb_vm_fstring_table(void));
-
+PUREFUNC(st_table *rb_vm_tfstring_table(void));
 
 /* vm_dump.c */
 void rb_print_backtrace(void);
diff --git a/string.c b/string.c
index 2012a281d6..751e0c0a4a 100644
--- a/string.c
+++ b/string.c
@@ -349,6 +349,99 @@ register_fstring(VALUE str)
     return ret;
 }
 
+static int
+tainted_fstr_update(st_data_t *key, st_data_t *val, st_data_t arg, int existing)
+{
+    VALUE *fstr = (VALUE *)arg;
+    VALUE str = (VALUE)*key;
+
+    if (existing) {
+	/* because of lazy sweep, str may be unmarked already and swept
+	 * at next time */
+	if (rb_objspace_garbage_object_p(str)) {
+	    *fstr = Qundef;
+	    return ST_DELETE;
+	}
+
+	*fstr = str;
+	return ST_STOP;
+    }
+    else {
+	str = rb_str_dup(str);
+	RB_OBJ_TAINT_RAW(str);
+	RB_FL_SET_RAW(str, RSTRING_FSTR);
+	RB_OBJ_FREEZE_RAW(str);
+
+	*key = *val = *fstr = str;
+	return ST_CONTINUE;
+    }
+}
+
+static VALUE
+rb_fstring_existing0(VALUE str)
+{
+    st_table *frozen_strings = rb_vm_fstring_table();
+    st_data_t fstr;
+
+    if (st_lookup(frozen_strings, str, &fstr)) {
+	if (rb_objspace_garbage_object_p(fstr)) {
+	    return register_fstring(str);
+	}
+	else {
+	    return (VALUE)fstr;
+	}
+    }
+    return Qnil;
+}
+
+static VALUE
+rb_tainted_fstring_existing(VALUE str)
+{
+    VALUE ret;
+    st_data_t fstr;
+    st_table *tfstrings = rb_vm_tfstring_table();
+
+    if (st_lookup(tfstrings, str, &fstr)) {
+	ret = (VALUE)fstr;
+	if (!rb_objspace_garbage_object_p(ret)) {
+	    return ret;
+	}
+    }
+    ret = rb_fstring_existing0(str);
+    if (NIL_P(ret)) {
+	return Qnil;
+    }
+    if (!RB_FL_TEST_RAW(ret, RSTRING_FSTR)) {
+	return Qnil;
+    }
+    do {
+	fstr = (st_data_t)ret;
+	st_update(tfstrings, fstr, tainted_fstr_update, (st_data_t)&fstr);
+    } while ((VALUE)fstr == Qundef);
+
+    ret = (VALUE)fstr;
+    assert(OBJ_FROZEN_RAW(ret));
+    assert(!FL_TEST_RAW(ret, STR_FAKESTR));
+    assert(!FL_TEST_RAW(ret, FL_EXIVAR));
+    assert(FL_TEST_RAW(ret, RSTRING_FSTR));
+    assert(FL_TEST_RAW(ret, FL_TAINT));
+    assert(RBASIC_CLASS(ret) == rb_cString);
+
+    return ret;
+}
+
+VALUE
+rb_fstring_existing(VALUE str)
+{
+    if (FL_TEST_RAW(str, RSTRING_FSTR))
+	return str;
+
+    if (!RB_OBJ_TAINTED_RAW(str))
+	return rb_fstring_existing0(str);
+
+    return rb_tainted_fstring_existing(str);
+}
+
 static VALUE
 setup_fake_str(struct RString *fake_str, const char *name, long len, int encidx)
 {
@@ -1311,6 +1404,7 @@ rb_str_free(VALUE str)
     if (FL_TEST(str, RSTRING_FSTR)) {
 	st_data_t fstr = (st_data_t)str;
 	st_delete(rb_vm_fstring_table(), &fstr, NULL);
+	st_delete(rb_vm_tfstring_table(), &fstr, NULL);
 	RB_DEBUG_COUNTER_INC(obj_str_fstr);
     }
 
diff --git a/test/ruby/test_optimization.rb b/test/ruby/test_optimization.rb
index 2d0eec02fd..3559820f9f 100644
--- a/test/ruby/test_optimization.rb
+++ b/test/ruby/test_optimization.rb
@@ -181,6 +181,26 @@ def test_hash_empty?
     assert_redefine_method('Hash', 'empty?', 'assert_nil({}.empty?); assert_nil({1=>1}.empty?)')
   end
 
+  def test_hash_reuse_fstring
+    key = %w(h e l l o).join
+    h = {}
+    h[key] = true
+    exp = "hello".freeze
+    assert_not_same key, h.keys[0]
+    assert h.keys[0].frozen?
+    assert_same exp, h.keys[0]
+
+    h1 = {}
+    h2 = {}
+    key.taint
+    h1[key] = true
+    h2[key] = true
+    k1 = h1.keys[0]
+    k2 = h2.keys[0]
+    assert_same k1, k2
+    assert_predicate k1, :tainted?
+  end
+
   def test_hash_aref_with
     h = { "foo" => 1 }
     assert_equal 1, h["foo"]
diff --git a/vm.c b/vm.c
index 814f8b6780..75ca12f2a5 100644
--- a/vm.c
+++ b/vm.c
@@ -2206,6 +2206,10 @@ ruby_vm_destruct(rb_vm_t *vm)
 	    st_free_table(vm->frozen_strings);
 	    vm->frozen_strings = 0;
 	}
+	if (vm->tainted_frozen_strings) {
+	    st_free_table(vm->tainted_frozen_strings);
+	    vm->tainted_frozen_strings = 0;
+	}
 	rb_vm_gvl_destroy(vm);
 	if (objspace) {
 	    rb_objspace_free(objspace);
@@ -3142,6 +3146,8 @@ Init_vm_objects(void)
     vm->mark_object_ary = rb_ary_tmp_new(128);
     vm->loading_table = st_init_strtable();
     vm->frozen_strings = st_init_table_with_size(&rb_fstring_hash_type, 1000);
+    vm->tainted_frozen_strings =
+		st_init_table_with_size(&rb_fstring_hash_type, 1000);
 }
 
 /* top self */
@@ -3203,6 +3209,12 @@ rb_vm_fstring_table(void)
     return GET_VM()->frozen_strings;
 }
 
+st_table *
+rb_vm_tfstring_table(void)
+{
+    return GET_VM()->tainted_frozen_strings;
+}
+
 #if VM_COLLECT_USAGE_DETAILS
 
 #define HASH_ASET(h, k, v) rb_hash_aset((h), (st_data_t)(k), (st_data_t)(v))
diff --git a/vm_core.h b/vm_core.h
index 679512390d..a808a1b1f7 100644
--- a/vm_core.h
+++ b/vm_core.h
@@ -574,6 +574,7 @@ typedef struct rb_vm_struct {
 
     VALUE *defined_strings;
     st_table *frozen_strings;
+    st_table *tainted_frozen_strings;
 
     /* params */
     struct { /* size in byte */
--

EW

Updated by nobu (Nobuyoshi Nakada) almost 7 years ago

Or should fstring_cmp consider also tainted flags?

Updated by normalperson (Eric Wong) almost 7 years ago

wrote:

Or should fstring_cmp consider also tainted flags?

I considered doing that, but it may not be optimal since I want
to share heap allocations with the non-tainted version via
rb_str_dup.

Updated by normalperson (Eric Wong) almost 7 years ago

Eric Wong wrote:

wrote:

Or should fstring_cmp consider also tainted flags?

I considered doing that, but it may not be optimal since I want
to share heap allocations with the non-tainted version via
rb_str_dup.

Slightly better version with commit message + stronger test for
sharing, and also using rb_str_resurrect instead of rb_str_dup:

https://80x24.org/spew/20170712223226.20939-1-e@80x24.org/raw

I will commit in a few days if no response.

Also, rb_hash_bulk_insert should do the same and avoid blind
calls to rb_str_new_frozen. This will be a separate, trivial
(I hope) change to try rb_fstring_existing before
rb_str_new_frozen.

Updated by ko1 (Koichi Sasada) almost 7 years ago

Sorry for late response
(because of my health problem and so on...)

I think we don't need to prepare frozen string table for tainted string because most of people don't use tainted strings. We need to measure the counts of that before commit it. If there are many of tainted strings, I need to change the mind...

If we have a few tainted objects, such complexity is not worth for us.

Updated by normalperson (Eric Wong) almost 7 years ago

wrote:

Sorry for late response
(because of my health problem and so on...)

No problem, I hope you feel better, soon. Thank you for any
response you give :)

I think we don't need to prepare frozen string table for tainted string because most of people don't use tainted strings. We need to measure the counts of that before commit it. If there are many of tainted strings, I need to change the mind...

Here is my measurement patch:

https://80x24.org/spew/20170713025614.GB18169@starla/raw
patch requires gcc for attribute((destructor))

It is frequent to have tainted strings when parsing YAML, HTTP
requests/responses from pure Ruby (webrick or net/http), and
email headers.

With "make install" and the measurement patch below, I get

tainted hit: 102 new: 60 (total: 1280)

for the final line

Testing HTTP with webrick via rackup and 1000 requests:

server command: rackup -s webrick -o 127.0.0.1 config.ru
client command: ab -c 1 -v 1 -n 1000 -k http://127.0.0.1:9292/

==> config.ru <==
run(lambda do |env|
env.each_key do |k|
warn "#{k.inspect} (#{k.tainted?})\n"
end
[ 200, {'Content-Length' => -'0'}, [] ]
end)

After server exit, I get:

tainted hit: 3996 new: 4 (total: 8001)

This is because these existing literals for HTTP headers sent
by ab also appear in the Rack source code:

"HTTP_HOST", "HTTP_USER_AGENT", "HTTP_ACCEPT"

(I'm not sure where "HTTP_CONNECTION" appears in the source,
actually, but that's also tainted from the Connection: header)

If we have a few tainted objects, such complexity is not worth for us.

Understood.

Updated by ko1 (Koichi Sasada) almost 7 years ago

Thank you for survey.

normalperson (Eric Wong) wrote:

I think we don't need to prepare frozen string table for tainted string because most of people don't use tainted strings. We need to measure the counts of that before commit it. If there are many of tainted strings, I need to change the mind...

Here is my measurement patch:

https://80x24.org/spew/20170713025614.GB18169@starla/raw
patch requires gcc for attribute((destructor))

It is frequent to have tainted strings when parsing YAML, HTTP
requests/responses from pure Ruby (webrick or net/http), and
email headers.

  • They taint string explicitly?

With "make install" and the measurement patch below, I get

tainted hit: 102 new: 60 (total: 1280)

for the final line

Testing HTTP with webrick via rackup and 1000 requests:

server command: rackup -s webrick -o 127.0.0.1 config.ru
client command: ab -c 1 -v 1 -n 1000 -k http://127.0.0.1:9292/

==> config.ru <==
run(lambda do |env|
env.each_key do |k|
warn "#{k.inspect} (#{k.tainted?})\n"
end
[ 200, {'Content-Length' => -'0'}, [] ]
end)

After server exit, I get:

tainted hit: 3996 new: 4 (total: 8001)

This is because these existing literals for HTTP headers sent
by ab also appear in the Rack source code:

"HTTP_HOST", "HTTP_USER_AGENT", "HTTP_ACCEPT"

(I'm not sure where "HTTP_CONNECTION" appears in the source,
actually, but that's also tainted from the Connection: header)

I'm not sure we should employ tainted fstring table, or fix these programs. Do you have opinion?

Updated by normalperson (Eric Wong) almost 7 years ago

wrote:

Thank you for survey.

No problem :)

normalperson (Eric Wong) wrote:

I think we don't need to prepare frozen string table for tainted string because most of people don't use tainted strings. We need to measure the counts of that before commit it. If there are many of tainted strings, I need to change the mind...

Here is my measurement patch:

https://80x24.org/spew/20170713025614.GB18169@starla/raw
patch requires gcc for attribute((destructor))

It is frequent to have tainted strings when parsing YAML, HTTP
requests/responses from pure Ruby (webrick or net/http), and
email headers.

  • They taint string explicitly?

No, all external data is tainted.
IO#read/read_nonblock/readpartial/gets all return tainted data:

$ ruby -e 'r, w = IO.pipe; w.write("."); p r.read(1).tainted?'
true

I'm not sure we should employ tainted fstring table, or fix
these programs. Do you have opinion?

I strongly favor using the tainted fstring table. It is more
transparent. I don't want to ask programmers to put `-'
everywhere for String#-@ dedupe.

In general, I don't like asking programmers to make changes
(especially not ugly ones) when Ruby can do so transparently.

Honestly, I never care for taint/$SAFE, and maybe most Rubyists
do not, either. However, it exists, and we need to do our best
to support it. And yes, for IO#read family methods, it's forced
on us to care about tainted strings, I guess.

Actions #13

Updated by Anonymous almost 7 years ago

  • Status changed from Assigned to Closed

Applied in changeset trunk|r59354.


hash: keep fstrings of tainted strings for string keys

The same hash keys may be loaded from tainted data sources
frequently (e.g. parsing headers from socket or loading
YAML data from a file). If a non-tainted fstring already
exists (because the application expects the hash key),
cache and deduplicate the tainted version in the new
tainted_frozen_strings table.

For non-embedded strings, this also allows sharing with the
underlying malloc-ed data.

  • vm_core.h (rb_vm_struct): add tainted_frozen_strings
  • vm.c (ruby_vm_destruct): free tainted_frozen_strings
    (Init_vm_objects): initialize tainted_frozen_strings
    (rb_vm_tfstring_table): accessor for tainted_frozen_strings
  • internal.h: declare rb_fstring_existing, rb_vm_tfstring_table
  • hash.c (fstring_existing_str): remove (moved to string.c)
    (hash_aset_str): use rb_fstring_existing
  • string.c (rb_fstring_existing): new, based on fstring_existing_str
    (tainted_fstr_update): new
    (rb_fstring_existing0): new, based on fstring_existing_str
    (rb_tainted_fstring_existing): new, special case for tainted strings
    (rb_str_free): delete from tainted_frozen_strings table
  • test/ruby/test_optimization.rb (test_hash_reuse_fstring): new test
    [ruby-core:82012] [Bug #13737]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0