Bug #1448
[patch] Proper handling of recursive arrays
| Status: | Closed | Start date: | 05/09/2009 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 100% |
|
| Category: | core | |||
| Target version: | 1.9.2 | |||
| ruby -v: | ruby 1.9.2dev (2009-05-09 trunk 23371) [i386-darwin9.6.0] |
Description
Dealing with recursive arrays & hashes can be tricky.
The current handling of recursive arrays is much improved over that of Ruby 1.8.6. Array comparison still has some bugs though.
For instance:
x = []; x << x
y = [[x]]
x == y # ==> true
y == x # ==> false, should be true!
Morevoer, recursive arrays that are built the same way are not recognized as equal:
z = []; z << z
x == z # ==> false, should be true!
Needless to say, arrays that have the same elements (e.g. a single element, containing a single element, ...) but built differently way are not recognized as equal:
stone = []; stepping = [stone]; stone << stepping
x == stepping # ==> false, would be nice to be true!
The attached patch fixes all of these problems :-)
* How:
The function rb_exec_recursive handles the recursivity by pushing and poping the elements it encounters for a given method (for example eql?). For such comparisons, instead of keeping track of the elements it encounters, I modified it so that it keeps track of both the elements being compared. A recursion is detected only when a matching pair is found.
This takes care of the first problem. For the next two, we only need to observe that if we have a recursion on the pair (x,y) when comparing x and y, then it is because they are not different! Changing the return value for recursive cases from nil (not comparable) / false (different) to Qundef (unknown yet) makes comparison of complex recursive "trees" work beautifully. I've added some cute samples in rubyspecs (core/array/shared/equal.rb)
* Implementation details:
Previous recursive_push/pop/check maintained a hash of encountered object ids, setting hash[obj] = true. I modified them so that in "paired" cases, it sets hash[obj] = paired_obj. If a pair (obj, different_paired_obj) is encountered later on, I set hash[obj] to {paired_obj => true, different_paired_obj => true}.
This way, there is basically no runtime cost to this technique, except in the complex recursive cases. Only for these complex cases is there a small additional cost for the hash creation/destruction.
* Last problem:
There is one more problem that my patch doesn't cover (lack of mri-fu): hashes for recursive structures are incorrect. As per the official doc, "a.eql? b" should imply "a.hash == b.hash". On the other hand, we have (before or after my patch):
a = [x]
x.eql? a # ==> true
a.eql? x # ==> true
x.hash == a.hash # ==> false, should have same hash
The solution is that when calculating the hash for an array, if a recursion is detected, then the hash should return a fixed value (say 0 or -length) _for the original_ array. Currently, 0 is returned but at the level that the recursion is detected. In Ruby pseudo-code, it would look like:
static VALUE
recursive_hash(VALUE ary, VALUE dummy, int recur)
{
long i, h;
VALUE n;
if (recur) {
+ raise HashingRecursionDetected
- return LONG2FIX(0);
}
h = rb_hash_start(RARRAY_LEN(ary));
for (i=0; i<RARRAY_LEN(ary); i++) {
n = rb_hash(RARRAY_PTR(ary)[i]);
h = rb_hash_uint(h, NUM2LONG(n));
}
h = rb_hash_end(h);
return LONG2FIX(h);
}
static VALUE
rb_ary_hash(VALUE ary)
{
return rb_exec_recursive(recursive_hash, ary, 0);
+ rescue HashingRecursionDetected
+ return -length
}
A similar modification must be made for hash.c.
Thanks
Marc-André Lafortune
Related issues
Associated revisions
* thread.c (rb_exec_recursive_paired): new function for proper
handling of recursive arrays. [EXPERIMENTAL] [ruby-core:23402]
* array.c (rb_ary_equal, rb_ary_eql, rb_ary_cmp): use above.
* hash.c (hash_equal): ditto.
* test/ruby/test_hash.rb (TestHash::test_equal2): recursive hashes
are handled properly now. ref: [ruby-core:23402]
* test/ruby/test_m17n.rb (TestM17N#test_sprintf_p): test fixed
History
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Hi, At Sat, 9 May 2009 11:47:53 +0900, Marc-Andre Lafortune wrote in [ruby-core:23402]: > The attached patch fixes all of these problems :-) But reverts old problems. $ ./ruby -e 'x={};x[1]=x;y={};y[1]=y; p x==y' -e:1:in `==': stack level too deep (SystemStackError) from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' ... 5124 levels... from -e:1:in `==' from -e:1:in `==' from -e:1:in `==' from -e:1:in `<main>' Also, note that arg of rb_exec_recursive() is not restricted to a real VALUE. > * Last problem: > There is one more problem that my patch doesn't cover (lack > of mri-fu): hashes for recursive structures are incorrect. As > per the official doc, "a.eql? b" should imply "a.hash == > b.hash". On the other hand, we have (before or after my > patch): > a = [x] > x.eql? a # ==> true > a.eql? x # ==> true > x.hash == a.hash # ==> false, should have same hash I wonder it may be false positive. $ ruby -e 'x=[];x<<x;a=[x]; p a, x' [[[...]]] [[...]] But no idea right now. A patch based on yours. Index: array.c =================================================================== --- array.c (revision 23373) +++ array.c (working copy) @@ -2729,5 +2729,5 @@ recursive_equal(VALUE ary1, VALUE ary2, long i; - if (recur) return Qfalse; + if (recur) return Qtrue; /* Subtle! */ for (i=0; i<RARRAY_LEN(ary1); i++) { if (!rb_equal(rb_ary_elt(ary1, i), rb_ary_elt(ary2, i))) @@ -2762,5 +2762,5 @@ rb_ary_equal(VALUE ary1, VALUE ary2) } if (RARRAY_LEN(ary1) != RARRAY_LEN(ary2)) return Qfalse; - return rb_exec_recursive(recursive_equal, ary1, ary2); + return rb_exec_recursive_paired(recursive_equal, ary1, ary2); } @@ -2770,5 +2770,5 @@ recursive_eql(VALUE ary1, VALUE ary2, in long i; - if (recur) return Qfalse; + if (recur) return Qtrue; /* Subtle! */ for (i=0; i<RARRAY_LEN(ary1); i++) { if (!rb_eql(rb_ary_elt(ary1, i), rb_ary_elt(ary2, i))) @@ -2792,5 +2792,5 @@ rb_ary_eql(VALUE ary1, VALUE ary2) if (TYPE(ary2) != T_ARRAY) return Qfalse; if (RARRAY_LEN(ary1) != RARRAY_LEN(ary2)) return Qfalse; - return rb_exec_recursive(recursive_eql, ary1, ary2); + return rb_exec_recursive_paired(recursive_eql, ary1, ary2); } @@ -2859,5 +2859,5 @@ recursive_cmp(VALUE ary1, VALUE ary2, in long i, len; - if (recur) return Qnil; + if (recur) return Qundef; /* Subtle! */ len = RARRAY_LEN(ary1); if (len > RARRAY_LEN(ary2)) { @@ -2901,5 +2901,5 @@ rb_ary_cmp(VALUE ary1, VALUE ary2) ary2 = to_ary(ary2); if (ary1 == ary2) return INT2FIX(0); - v = rb_exec_recursive(recursive_cmp, ary1, ary2); + v = rb_exec_recursive_paired(recursive_cmp, ary1, ary2); if (v != Qundef) return v; len = RARRAY_LEN(ary1) - RARRAY_LEN(ary2); Index: hash.c =================================================================== --- hash.c (revision 23373) +++ hash.c (working copy) @@ -1489,5 +1489,5 @@ hash_equal(VALUE hash1, VALUE hash2, int data.tbl = RHASH(hash2)->ntbl; data.eql = eql; - return rb_exec_recursive(recursive_eql, hash1, (VALUE)&data); + return rb_exec_recursive_paired(recursive_eql, hash1, (VALUE)&data); } Index: thread.c =================================================================== --- thread.c (revision 23373) +++ thread.c (working copy) @@ -3311,5 +3311,5 @@ static ID recursive_key; static VALUE -recursive_check(VALUE hash, VALUE obj) +recursive_check(VALUE hash, VALUE obj, VALUE paired_obj) { if (NIL_P(hash) || TYPE(hash) != T_HASH) { @@ -3317,10 +3317,23 @@ recursive_check(VALUE hash, VALUE obj) } else { - VALUE list = rb_hash_aref(hash, ID2SYM(rb_frame_this_func())); + VALUE sym = ID2SYM(rb_frame_this_func()); + VALUE list = rb_hash_aref(hash, sym); + VALUE pair_list; if (NIL_P(list) || TYPE(list) != T_HASH) return Qfalse; - if (NIL_P(rb_hash_lookup(list, obj))) + pair_list = rb_hash_lookup2(list, obj, Qundef); + if (pair_list == Qundef) return Qfalse; + if (paired_obj) { + if (TYPE(pair_list) != T_HASH) { + if (pair_list != paired_obj) + return Qfalse; + } + else { + if (NIL_P(rb_hash_lookup(pair_list, paired_obj))) + return Qfalse; + } + } return Qtrue; } @@ -3328,7 +3341,7 @@ recursive_check(VALUE hash, VALUE obj) static VALUE -recursive_push(VALUE hash, VALUE obj) +recursive_push(VALUE hash, VALUE obj, VALUE paired_obj) { - VALUE list, sym; + VALUE list, sym, pair_list; sym = ID2SYM(rb_frame_this_func()); @@ -3345,20 +3358,31 @@ recursive_push(VALUE hash, VALUE obj) rb_hash_aset(hash, sym, list); } - rb_hash_aset(list, obj, Qtrue); + if (!paired_obj) { + rb_hash_aset(list, obj, Qtrue); + } + else if ((pair_list = rb_hash_lookup2(list, obj, Qundef)) == Qundef) { + rb_hash_aset(list, obj, paired_obj); + } + else { + if (TYPE(pair_list) != T_HASH){ + VALUE other_paired_obj = pair_list; + pair_list = rb_hash_new(); + rb_hash_aset(pair_list, other_paired_obj, Qtrue); + rb_hash_aset(list, obj, pair_list); + } + rb_hash_aset(pair_list, paired_obj, Qtrue); + } return hash; } static void -recursive_pop(VALUE hash, VALUE obj) +recursive_pop(VALUE hash, VALUE obj, VALUE paired_obj) { - VALUE list, sym; + VALUE list, sym, pair_list, symname, thrname; sym = ID2SYM(rb_frame_this_func()); if (NIL_P(hash) || TYPE(hash) != T_HASH) { - VALUE symname; - VALUE thrname; symname = rb_inspect(sym); thrname = rb_inspect(rb_thread_current()); - rb_raise(rb_eTypeError, "invalid inspect_tbl hash for %s in %s", StringValuePtr(symname), StringValuePtr(thrname)); @@ -3366,19 +3390,34 @@ recursive_pop(VALUE hash, VALUE obj) list = rb_hash_aref(hash, sym); if (NIL_P(list) || TYPE(list) != T_HASH) { - VALUE symname = rb_inspect(sym); - VALUE thrname = rb_inspect(rb_thread_current()); + symname = rb_inspect(sym); + thrname = rb_inspect(rb_thread_current()); rb_raise(rb_eTypeError, "invalid inspect_tbl list for %s in %s", StringValuePtr(symname), StringValuePtr(thrname)); } + if (paired_obj) { + pair_list = rb_hash_lookup2(list, obj, Qundef); + if (pair_list == Qundef) { + symname = rb_inspect(sym); + thrname = rb_inspect(rb_thread_current()); + rb_raise(rb_eTypeError, "invalid inspect_tbl pair_list for %s in %s", + StringValuePtr(symname), StringValuePtr(thrname)); + } + if (TYPE(pair_list) == T_HASH) { + rb_hash_delete(pair_list, obj); + if (!RHASH_EMPTY_P(pair_list)) { + return; /* keep hash until is empty */ + } + } + } rb_hash_delete(list, obj); } -VALUE -rb_exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg) +static VALUE +exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg, VALUE pairid) { volatile VALUE hash = rb_thread_local_aref(rb_thread_current(), recursive_key); VALUE objid = rb_obj_id(obj); - if (recursive_check(hash, objid)) { + if (recursive_check(hash, objid, pairid)) { return (*func) (obj, arg, Qtrue); } @@ -3387,5 +3426,5 @@ rb_exec_recursive(VALUE (*func) (VALUE, int state; - hash = recursive_push(hash, objid); + hash = recursive_push(hash, objid, pairid); PUSH_TAG(); if ((state = EXEC_TAG()) == 0) { @@ -3393,5 +3432,5 @@ rb_exec_recursive(VALUE (*func) (VALUE, } POP_TAG(); - recursive_pop(hash, objid); + recursive_pop(hash, objid, pairid); if (state) JUMP_TAG(state); @@ -3400,4 +3439,16 @@ rb_exec_recursive(VALUE (*func) (VALUE, } +VALUE +rb_exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg) +{ + return exec_recursive(func, obj, arg, 0); +} + +VALUE +rb_exec_recursive_paired(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg) +{ + return exec_recursive(func, obj, arg, rb_obj_id(arg)); +} + /* tracer */ Index: include/ruby/intern.h =================================================================== --- include/ruby/intern.h (revision 23373) +++ include/ruby/intern.h (working copy) @@ -341,4 +341,5 @@ void rb_thread_atfork(void); void rb_thread_atfork_before_exec(void); VALUE rb_exec_recursive(VALUE(*)(VALUE, VALUE, int),VALUE,VALUE); +VALUE rb_exec_recursive_paired(VALUE(*)(VALUE, VALUE, int),VALUE,VALUE); /* file.c */ VALUE rb_file_s_expand_path(int, VALUE *); -- Nobu Nakada
Updated by marcandre (Marc-Andre Lafortune) about 3 years ago
- File recursion2.patch added
Oups, completely forgot to finish the hash part. The problem was that I was not pairing hash1 with hash2 but with the temporary struct made for the comparison. This is fixed in the included patch. I have updated rubyspecs (core/hash/equal_value_spec) with complex recursive hash tests. Specs now test recursive hashes, even when formed differently but with the same content, and arrays containing hashes containing arrays... For the false positive: firstly, the example you show is already == in the current version. The equality for arrays is "do they contain the same elements in the same order". For recursive arrays, it's easier to think about the reverse: arrays are not equal if there is a mismatch between one element and the corresponding one. [a] and a don't have such a mismatch. PS: Thanks for prettying up my code! I had difficulty applying the patch, I hope everything made it through.
Updated by nobu (Nobuyoshi Nakada) about 3 years ago
Hi, At Sun, 10 May 2009 07:37:31 +0900, Marc-Andre Lafortune wrote in [ruby-core:23412]: > File recursion2.patch added This patch seems nice to me. I'm afraid matz may not like the name rb_exec_recursive_paired(). Is it OK? -- Nobu Nakada
Updated by marcandre (Marc-Andre Lafortune) almost 3 years ago
Nobuyoshi Nakada wrote: > This patch seems nice to me. > I'm afraid matz may not like the name rb_exec_recursive_paired(). Is it OK? 'rb_exec_recursive_paired' is the name that came to my mind because it checks recursion on a pair of objects. Maybe 'rb_exec_recursive_on_pair' or 'rb_exec_recursive_with_pair' would be even better. As long as a good name is chosen, I'll be happy :-) Thanks
Updated by nobu (Nobuyoshi Nakada) almost 3 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset r23557.