Bug #19334
closedDefining many instance variables and accessing them is slow in Ruby 3.2.0
Added by mame (Yusuke Endoh) almost 2 years ago. Updated over 1 year ago.
Description
class C
eval("def initialize; #{ (0..100000).map { "@x#{ _1 } = 0; " }.join } end")
attr_reader :x50000
end
p :start
C.new.x50000
This script takes less than one second in Ruby 3.1.3, and does more than ten second in Ruby 3.2.0.
$ time ruby -v test.rb
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [x86_64-linux]
:start
real 0m0.210s
user 0m0.167s
sys 0m0.044s
$ time ruby -v test.rb
ruby 3.2.0 (2022-12-25 revision a528908271) [x86_64-linux]
:start
real 0m11.026s
user 0m10.950s
sys 0m0.040s
This problem is not critical, but is there any room for improvement?
Updated by Eregon (Benoit Daloze) almost 2 years ago
Is there real code doing this?
Otherwise IMHO it's complexity and significant footprint increase to solve an irrelevant issue (misuse of instance variables).
Updated by mame (Yusuke Endoh) almost 2 years ago
I didn't explain the problem properly. The code takes a super-linear time for the number of instance variables: 50k ivars take 3.5 sec., 100k do 15 sec., and 200k do 60 sec. Is this intentional? I think something unexpected is happening.
I consider this issue a hint for improvement. We should first find out what is happening in the code. Fixing this issue may make normal code faster. If the disadvantage of fixing it is clearly much larger than the disadvantage, we may choose not to dare to fix it. I don't think it is a good idea to assume it's not worth fixing before you even look into it. (Personally, I would like to see Ruby work robustly even in somewhat eccentric cases.)
I suspect that this degradation is caused by object shapes #18776 (sorry if I am wrong). So I would be happy if its authors, @jemmai (Jemma Issroff) and @tenderlovemaking (Aaron Patterson), would look into it. But if they won't do it, I will look into it myself.
Updated by tenderlovemaking (Aaron Patterson) almost 2 years ago
mame (Yusuke Endoh) wrote in #note-2:
I didn't explain the problem properly. The code takes a super-linear time for the number of instance variables: 50k ivars take 3.5 sec., 100k do 15 sec., and 200k do 60 sec. Is this intentional? I think something unexpected is happening.
I consider this issue a hint for improvement. We should first find out what is happening in the code. Fixing this issue may make normal code faster. If the disadvantage of fixing it is clearly much larger than the disadvantage, we may choose not to dare to fix it. I don't think it is a good idea to assume it's not worth fixing before you even look into it. (Personally, I would like to see Ruby work robustly even in somewhat eccentric cases.)
I suspect that this degradation is caused by object shapes #18776 (sorry if I am wrong). So I would be happy if its authors, @jemmai (Jemma Issroff) and @tenderlovemaking (Aaron Patterson), would look into it. But if they won't do it, I will look into it myself.
Yes, we're definitely looking in to it. I'll post an update here when I have more information 😊
Updated by byroot (Jean Boussier) almost 2 years ago
Out of curiosity I profiled this script. 99% of the time is spent in rb_shape_get_iv_index()
, which makes sense given that it walk back up the shape tree on every access, so the deeper the tree the longer it takes.
I suppose this could be dramatically improved by storing the iv_index
inside each shape, at the expense of using a bit more memory.
Updated by byroot (Jean Boussier) almost 2 years ago
I suppose this could be dramatically improved by storing the iv_index inside each shape, at the expense of using a bit more memory.
Actually, nevermind, the iv_index is already in the shape. The reason it's particularly slow here is that we add a new variable every time, so we have to walk back to the shape root every time to check that the variable doesn't exist yet.
So to optimize this, a hash table would be needed. IIRC Aaron mentioned that V8 use a hash table after a certain number of edges (20?).
Updated by Eregon (Benoit Daloze) almost 2 years ago
byroot (Jean Boussier) wrote in #note-5:
IIRC Aaron mentioned that V8 use a hash table after a certain number of edges (20?).
That's the complexity I'm talking about above which IMO is not worth it (that JS completely changes its object representation based on a fragile heuristic).
Ruby is a proper language which doesn't confuse objects ivars and Hash pairs, so there shouldn't be a need to fallback to using a hashtable representation for ivars.
An extra map/Hash per Shape to know which ivars it contains could work, but it would pretty bad for memory footprint with 1 hashtable per Shape.
So it needs something more memory efficient than a hashtable per Shape.
Truffle uses an Hash array mapped trie for that: https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.object/src/com/oracle/truffle/object/TriePropertyMap.java
Before that it used https://github.com/oracle/graal/blob/master/truffle/src/com.oracle.truffle.object/src/com/oracle/truffle/object/ConsListPropertyMap.java, which is probably similar to what CRuby does currently.
I'm not sure the HAMT is needed, but that's probably the common solution to this issue.
Updated by byroot (Jean Boussier) almost 2 years ago
An extra map/Hash per Shape to know which ivars it contains could work
Yeah that's more what I had in mind.
but it would pretty bad for memory footprint with 1 hashtable per Shape.
What about one every X shapes? This way you'd have to walk at most X shapes, and then to one hash lookup. X could be something relatively high like 20 or so.
Updated by jemmai (Jemma Issroff) almost 2 years ago
We merged this PR as a temporary fix. After 50 IV transitions, it falls back to the obj_too_complex shape, which uses a hash lookup. If we want, we could backport this change. I am unsure whether it's worth it because I don't know how likely this case is to occur in production code.
We could also increase the choice of max IVs from 50 to some other number.
New performance numbers with this change are:
$ time ruby -v test.rb
ruby 3.3.0dev (2023-01-25T16:50:33Z limit-num-shapes 29c90b22bb) [arm64-darwin22]
:start
ruby -v test.rb 0.13s user 0.03s system 96% cpu 0.159 total
$ time ruby -v test.rb
ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [arm64-darwin22]
:start
ruby -v test.rb 0.13s user 0.03s system 97% cpu 0.165 total
We are planning a more permanent fix as well. Our thinking is that after every 50 IV shapes, we could insert a "hash_iv_index_shape," which will have a pointer to an rb_id_table
which stores iv_name -> index mapping for the previous 50 shapes, and a pointer to the previous "hash_iv_index_shape". This will significantly speed up the case above as well.
Updated by byroot (Jean Boussier) almost 2 years ago
- Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: UNKNOWN
@mame (Yusuke Endoh) would you consider 78fcc9847a9db6d42c8c263154ec05903a370b6b worth backporting?
Updated by Eregon (Benoit Daloze) almost 2 years ago
- Related to Bug #19381: SEGV - ivars, both Ubuntu & Windows added
Updated by Eregon (Benoit Daloze) almost 2 years ago
Let's wait it's stable before backporting: https://bugs.ruby-lang.org/issues/19381#note-3
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
- Status changed from Open to Closed
Updated by Dan0042 (Daniel DeLorme) over 1 year ago
jemmai (Jemma Issroff) wrote in #note-8:
We are planning a more permanent fix as well. Our thinking is that after every 50 IV shapes, we could insert a "hash_iv_index_shape," which will have a pointer to an
rb_id_table
which stores iv_name -> index mapping for the previous 50 shapes, and a pointer to the previous "hash_iv_index_shape". This will significantly speed up the case above as well.
I don't really understand this arbitrary limitation on "every 50 IV shapes". If a shape has a rb_id_table
for lookups, then it can be shared with its parent but not its siblings. So in the case of 100k ivars like the example, we go through 100k shape transitions, and each of those shapes can point to the same lookup table. We need just one big lookup table for a chain of 100k shapes. No?