Project

General

Profile

Feature #15251

Hash aset should deduplicate non tainted string

Added by chopraanmol1 (Anmol Chopra) 7 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:89549]

Description

I'm not sure if current behavior is expected one or a bug. So feel free to change tracker type.

Currently Hash ASET checks if non-tainted string exists in fstring table or not, if it doesn't then ruby duplicates string and freeze it. This works well for string_literal because they are already registered in fstring table, but it doesn't work for non-literal string.

Patch

https://github.com/ruby/ruby/pull/1993

O/P of attached file(test_hash_keys_deduped.rb) on trunk:

string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 100
interpolated_string => 100
string add => 100
string append => 100
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring + GC
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 100
interpolated_string => 100
string add => 100
string append => 100

after patch

string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 1
interpolated_string => 1
string add => 1
string append => 1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fstring + GC
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
string_literal => 1
string times 1 => 1
string times 3 string times 3 string times 3  => 1
interpolated_string => 1
string add => 1
string append => 1

Benchmark result(bench_hash_aset.rb):
Trunk:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.880000   0.000000   0.880000 (  0.880699)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.980000   0.000000   0.980000 (  0.978089)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`random text`]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  3.716000   0.004000   3.720000 (  3.722688)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.868000   0.000000   0.868000 (  0.868405)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.948000   0.000000   0.948000 (  0.948946)

Patched:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.872000   0.000000   0.872000 (  0.872410)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.864000   0.000000   0.864000 (  0.865356)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`random text`]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  3.780000   0.000000   3.780000 (  3.779730)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.868000   0.000000   0.868000 (  0.867957)
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Hash#[`string non-literal`.dup]=
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  0.872000   0.000000   0.872000 (  0.873573)

Files

test_hash_keys_deduped.rb (927 Bytes) test_hash_keys_deduped.rb chopraanmol1 (Anmol Chopra), 10/24/2018 01:03 PM
bench_hash_aset.rb (1.35 KB) bench_hash_aset.rb chopraanmol1 (Anmol Chopra), 10/25/2018 06:56 AM

Associated revisions

Revision 1e83e15a
Added by normal 7 months ago

hash.c: aset deduplicates un-tainted string

We revisit [Bug #9188] since st.c is much improved since then,
and benchmarks against so_k_nucleotide seem to indicate little
or no performance change compared to before.

[ruby-core:89555] [Feature #15251]

From: Anmol Chopra chopraanmol1@gmail.com

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65371 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 65371
Added by normalperson (Eric Wong) 7 months ago

hash.c: aset deduplicates un-tainted string

We revisit [Bug #9188] since st.c is much improved since then,
and benchmarks against so_k_nucleotide seem to indicate little
or no performance change compared to before.

[ruby-core:89555] [Feature #15251]

From: Anmol Chopra chopraanmol1@gmail.com

Revision 65371
Added by normal 7 months ago

hash.c: aset deduplicates un-tainted string

We revisit [Bug #9188] since st.c is much improved since then,
and benchmarks against so_k_nucleotide seem to indicate little
or no performance change compared to before.

[ruby-core:89555] [Feature #15251]

From: Anmol Chopra chopraanmol1@gmail.com

Revision 81c84917
Added by nobu (Nobuyoshi Nakada) 7 months ago

Simplify conditions [Feature #15251]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65372 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 65372
Added by nobu (Nobuyoshi Nakada) 7 months ago

Simplify conditions [Feature #15251]

Revision 65372
Added by nobu (Nobuyoshi Nakada) 7 months ago

Simplify conditions [Feature #15251]

Revision 300ac6c5
Added by nobu (Nobuyoshi Nakada) 7 months ago

Fix a condition and just call rb_fstring [Feature #15251]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 65374
Added by nobu (Nobuyoshi Nakada) 7 months ago

Fix a condition and just call rb_fstring [Feature #15251]

Revision 65374
Added by nobu (Nobuyoshi Nakada) 7 months ago

Fix a condition and just call rb_fstring [Feature #15251]

History

#1

Updated by chopraanmol1 (Anmol Chopra) 7 months ago

  • Description updated (diff)
#2

Updated by chopraanmol1 (Anmol Chopra) 7 months ago

  • Description updated (diff)
#3

Updated by chopraanmol1 (Anmol Chopra) 7 months ago

  • Description updated (diff)

Updated by normalperson (Eric Wong) 7 months ago

chopraanmol1@gmail.com wrote:

https://bugs.ruby-lang.org/issues/15251

I'm not sure if current behavior is expected one or a bug. So fell free to
change tracker type.

current behavior is intentional because of a regression case:
https://bugs.ruby-lang.org/issues/9188

Currently Hash aset checks if non-tainted string exists in
fstring table or not, if it does not then ruby duplicates
string and freeze it. This works well for string_literal
because they are already registered in fstring table during
compilation, but it doesn't work for non-string literal.

Right, I prefer we always dedupe since more programs will
benefit. We can take a small regression for programs that write
uncommon/random keys to hashes.

Updated by chopraanmol1 (Anmol Chopra) 7 months ago

normalperson (Eric Wong) I also benchmarked so_k_nucleotide mentioned in https://bugs.ruby-lang.org/issues/9188 with following command

benchmark-driver benchmark/so_k_nucleotide.yml -e "path_to_patched_binary" -e "path_to_trunk_binary" -e "path_to_patched_binary --jit" -e "path_to_trunk_binary --jit" --repeat-count 10

Result:

Calculating -------------------------------------
                     path_to_patched_binary  path_to_trunk_binary  path_to_patched_binary --jit  path_to_trunk_binary --jit 
     so_k_nucleotide                  0.786                 0.755                         0.686                       0.673 i/s -       1.000 times in 1.272604s 1.324481s 1.458204s 1.485638s

Comparison:
                  so_k_nucleotide
path_to_patched_binary:         0.8 i/s 
path_to_trunk_binary:           0.8 i/s - 1.04x  slower
path_to_patched_binary --jit:   0.7 i/s - 1.15x  slower
path_to_trunk_binary --jit:     0.7 i/s - 1.17x  slower

I think so far it looks good, let me know if I did something wrong while running the above benchmark.

#6

Updated by normalperson (Eric Wong) 7 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r65371.


hash.c: aset deduplicates un-tainted string

We revisit [Bug #9188] since st.c is much improved since then,
and benchmarks against so_k_nucleotide seem to indicate little
or no performance change compared to before.

[ruby-core:89555] [Feature #15251]

From: Anmol Chopra chopraanmol1@gmail.com

Updated by normalperson (Eric Wong) 7 months ago

chopraanmol1@gmail.com wrote:

I think so far it looks good, let me know if I did something wrong while running the above benchmark.

I agree, so I've committed your patch as-is for r65371.

I wanted to try a shorter patch:

https://80x24.org/spew/20181026050908.1183-1-e@80x24.org/raw

But I got some spec failures due to singleton class (below). I
haven't investigated, yet, but I think there may be an existing
bug in hash.c, because my shorter patch ought to work...

1)
Hash#[]= duplicates string keys using dup semantics FAILED
Expected "bar"
to equal "oof"

ruby/spec/ruby/core/hash/shared/store.rb:16:in block (2 levels) in <top (required)>'
ruby/spec/ruby/core/hash/element_set_spec.rb:5:in
'

2)
Hash#store duplicates string keys using dup semantics FAILED
Expected "bar"
to equal "oof"

ruby/spec/ruby/core/hash/shared/store.rb:16:in block (2 levels) in <top (required)>'
ruby/spec/ruby/core/hash/store_spec.rb:5:in
'

Updated by normalperson (Eric Wong) 7 months ago

But I got some spec failures due to singleton class (below). I
haven't investigated, yet, but I think there may be an existing
bug in hash.c, because my shorter patch ought to work...

Maybe this is an appropriate simplification:

https://80x24.org/spew/20181026055336.25908-1-e@80x24.org/raw

Also available in: Atom PDF