Bug #1448

[patch] Proper handling of recursive arrays

Added by marcandre (Marc-Andre Lafortune) about 3 years ago. Updated about 1 year ago.

[ruby-core:23402]
Status:Closed Start date:05/09/2009
Priority:Normal Due date:
Assignee:- % Done:

100%

Category:core
Target version:1.9.2
ruby -v:ruby 1.9.2dev (2009-05-09 trunk 23371) [i386-darwin9.6.0]

Description

Dealing with recursive arrays & hashes can be tricky.

The current handling of recursive arrays is much improved over that of Ruby 1.8.6. Array comparison still has some bugs though.

For instance:
x = []; x << x
y = [[x]]
x == y   # ==> true
y == x   # ==> false, should be true!

Morevoer, recursive arrays that are built the same way are not recognized as equal:
z = []; z << z
x == z   # ==> false, should be true!

Needless to say, arrays that have the same elements (e.g. a single element, containing a single element, ...) but built differently way are  not recognized as equal:
stone = []; stepping = [stone]; stone << stepping
x == stepping   # ==> false, would be nice to be true!


The attached patch fixes all of these problems :-)


* How:
The function rb_exec_recursive handles the recursivity by pushing and poping the elements it encounters for a given method (for example eql?). For such comparisons, instead of keeping track of the elements it encounters, I modified it so that it keeps track of both the elements being compared. A recursion is detected only when a matching pair is found.

This takes care of the first problem. For the next two, we only need to observe that if we have a recursion on the pair (x,y) when comparing x and y, then it is because they are not different! Changing the return value for recursive cases from nil (not comparable) / false (different) to Qundef (unknown yet) makes comparison of complex recursive "trees" work beautifully. I've added some cute samples in rubyspecs (core/array/shared/equal.rb)

* Implementation details:
Previous recursive_push/pop/check maintained a hash of encountered object ids, setting hash[obj] = true. I modified them so that in "paired" cases, it sets hash[obj] = paired_obj. If a pair (obj, different_paired_obj) is encountered later on, I set hash[obj] to {paired_obj => true, different_paired_obj => true}.

This way, there is basically no runtime cost to this technique, except in the complex recursive cases. Only for these complex cases is there a small additional cost for the hash creation/destruction.

* Last problem:
There is one more problem that my patch doesn't cover (lack of mri-fu): hashes for recursive structures are incorrect. As per the official doc, "a.eql? b" should imply "a.hash == b.hash". On the other hand, we have (before or after my patch):
a = [x]
x.eql? a  # ==> true
a.eql? x  # ==> true
x.hash == a.hash # ==> false, should have same hash

The solution is that when calculating the hash for an array, if a recursion is detected, then the hash should return a fixed value (say 0 or -length) _for the original_ array. Currently, 0 is returned but at the level that the recursion is detected. In Ruby pseudo-code, it would look like:

    static VALUE
    recursive_hash(VALUE ary, VALUE dummy, int recur)
    {
        long i, h;
        VALUE n;

        if (recur) {
+           raise HashingRecursionDetected
-           return LONG2FIX(0);
        }
        h = rb_hash_start(RARRAY_LEN(ary));
        for (i=0; i<RARRAY_LEN(ary); i++) {
            n = rb_hash(RARRAY_PTR(ary)[i]);
            h = rb_hash_uint(h, NUM2LONG(n));
        }
        h = rb_hash_end(h);
        return LONG2FIX(h);
    }

    static VALUE
    rb_ary_hash(VALUE ary)
    {
        return rb_exec_recursive(recursive_hash, ary, 0);
+   rescue HashingRecursionDetected
+       return -length
    }

A similar modification must be made for hash.c.

Thanks

Marc-André Lafortune

recursion.patch (7.5 kB) marcandre (Marc-Andre Lafortune), 05/09/2009 11:47 am

recursion2.patch (7.4 kB) marcandre (Marc-Andre Lafortune), 05/10/2009 07:37 am


Related issues

duplicated by ruby-trunk - Bug #1508: Recursive arrays with the same structure are not eql?. Closed 05/24/2009

Associated revisions

Revision 23557
Added by nobu (Nobuyoshi Nakada) almost 3 years ago

* thread.c (rb_exec_recursive_paired): new function for proper handling of recursive arrays. [EXPERIMENTAL] [ruby-core:23402] * array.c (rb_ary_equal, rb_ary_eql, rb_ary_cmp): use above. * hash.c (hash_equal): ditto.

Revision 23563
Added by matz (Yukihiro Matsumoto) almost 3 years ago

* test/ruby/test_hash.rb (TestHash::test_equal2): recursive hashes are handled properly now. ref: [ruby-core:23402] * test/ruby/test_m17n.rb (TestM17N#test_sprintf_p): test fixed

History

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Hi,

At Sat, 9 May 2009 11:47:53 +0900,
Marc-Andre Lafortune wrote in [ruby-core:23402]:
> The attached patch fixes all of these problems :-)

But reverts old problems.

$ ./ruby -e 'x={};x[1]=x;y={};y[1]=y; p x==y'
-e:1:in `==': stack level too deep (SystemStackError)
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	 ... 5124 levels...
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `=='
	from -e:1:in `<main>'

Also, note that arg of rb_exec_recursive() is not restricted to
a real VALUE.

> * Last problem:
> There is one more problem that my patch doesn't cover (lack
> of mri-fu): hashes for recursive structures are incorrect. As
> per the official doc, "a.eql? b" should imply "a.hash ==
> b.hash". On the other hand, we have (before or after my
> patch):
> a = [x]
> x.eql? a  # ==> true
> a.eql? x  # ==> true
> x.hash == a.hash # ==> false, should have same hash

I wonder it may be false positive.

$ ruby -e 'x=[];x<<x;a=[x]; p a, x'
[[[...]]]
[[...]]

But no idea right now.

A patch based on yours.


Index: array.c
===================================================================
--- array.c	(revision 23373)
+++ array.c	(working copy)
@@ -2729,5 +2729,5 @@ recursive_equal(VALUE ary1, VALUE ary2, 
     long i;

-    if (recur) return Qfalse;
+    if (recur) return Qtrue; /* Subtle! */
     for (i=0; i<RARRAY_LEN(ary1); i++) {
 	if (!rb_equal(rb_ary_elt(ary1, i), rb_ary_elt(ary2, i)))
@@ -2762,5 +2762,5 @@ rb_ary_equal(VALUE ary1, VALUE ary2)
     }
     if (RARRAY_LEN(ary1) != RARRAY_LEN(ary2)) return Qfalse;
-    return rb_exec_recursive(recursive_equal, ary1, ary2);
+    return rb_exec_recursive_paired(recursive_equal, ary1, ary2);
 }

@@ -2770,5 +2770,5 @@ recursive_eql(VALUE ary1, VALUE ary2, in
     long i;

-    if (recur) return Qfalse;
+    if (recur) return Qtrue; /* Subtle! */
     for (i=0; i<RARRAY_LEN(ary1); i++) {
 	if (!rb_eql(rb_ary_elt(ary1, i), rb_ary_elt(ary2, i)))
@@ -2792,5 +2792,5 @@ rb_ary_eql(VALUE ary1, VALUE ary2)
     if (TYPE(ary2) != T_ARRAY) return Qfalse;
     if (RARRAY_LEN(ary1) != RARRAY_LEN(ary2)) return Qfalse;
-    return rb_exec_recursive(recursive_eql, ary1, ary2);
+    return rb_exec_recursive_paired(recursive_eql, ary1, ary2);
 }

@@ -2859,5 +2859,5 @@ recursive_cmp(VALUE ary1, VALUE ary2, in
     long i, len;

-    if (recur) return Qnil;
+    if (recur) return Qundef;	/* Subtle! */
     len = RARRAY_LEN(ary1);
     if (len > RARRAY_LEN(ary2)) {
@@ -2901,5 +2901,5 @@ rb_ary_cmp(VALUE ary1, VALUE ary2)
     ary2 = to_ary(ary2);
     if (ary1 == ary2) return INT2FIX(0);
-    v = rb_exec_recursive(recursive_cmp, ary1, ary2);
+    v = rb_exec_recursive_paired(recursive_cmp, ary1, ary2);
     if (v != Qundef) return v;
     len = RARRAY_LEN(ary1) - RARRAY_LEN(ary2);
Index: hash.c
===================================================================
--- hash.c	(revision 23373)
+++ hash.c	(working copy)
@@ -1489,5 +1489,5 @@ hash_equal(VALUE hash1, VALUE hash2, int
     data.tbl = RHASH(hash2)->ntbl;
     data.eql = eql;
-    return rb_exec_recursive(recursive_eql, hash1, (VALUE)&data);
+    return rb_exec_recursive_paired(recursive_eql, hash1, (VALUE)&data);
 }

Index: thread.c
===================================================================
--- thread.c	(revision 23373)
+++ thread.c	(working copy)
@@ -3311,5 +3311,5 @@ static ID recursive_key;

 static VALUE
-recursive_check(VALUE hash, VALUE obj)
+recursive_check(VALUE hash, VALUE obj, VALUE paired_obj)
 {
     if (NIL_P(hash) || TYPE(hash) != T_HASH) {
@@ -3317,10 +3317,23 @@ recursive_check(VALUE hash, VALUE obj)
     }
     else {
-	VALUE list = rb_hash_aref(hash, ID2SYM(rb_frame_this_func()));
+	VALUE sym = ID2SYM(rb_frame_this_func());
+	VALUE list = rb_hash_aref(hash, sym);
+	VALUE pair_list;

 	if (NIL_P(list) || TYPE(list) != T_HASH)
 	    return Qfalse;
-	if (NIL_P(rb_hash_lookup(list, obj)))
+	pair_list = rb_hash_lookup2(list, obj, Qundef);
+	if (pair_list == Qundef)
 	    return Qfalse;
+	if (paired_obj) {
+	    if (TYPE(pair_list) != T_HASH) {
+		if (pair_list != paired_obj)
+		    return Qfalse;
+	    }
+	    else {
+		if (NIL_P(rb_hash_lookup(pair_list, paired_obj)))
+		    return Qfalse;
+	    }
+	}
 	return Qtrue;
     }
@@ -3328,7 +3341,7 @@ recursive_check(VALUE hash, VALUE obj)

 static VALUE
-recursive_push(VALUE hash, VALUE obj)
+recursive_push(VALUE hash, VALUE obj, VALUE paired_obj)
 {
-    VALUE list, sym;
+    VALUE list, sym, pair_list;

     sym = ID2SYM(rb_frame_this_func());
@@ -3345,20 +3358,31 @@ recursive_push(VALUE hash, VALUE obj)
 	rb_hash_aset(hash, sym, list);
     }
-    rb_hash_aset(list, obj, Qtrue);
+    if (!paired_obj) {
+	rb_hash_aset(list, obj, Qtrue);
+    }
+    else if ((pair_list = rb_hash_lookup2(list, obj, Qundef)) == Qundef) {
+	rb_hash_aset(list, obj, paired_obj);
+    }
+    else {
+	if (TYPE(pair_list) != T_HASH){
+	    VALUE other_paired_obj = pair_list;
+	    pair_list = rb_hash_new();
+	    rb_hash_aset(pair_list, other_paired_obj, Qtrue);
+	    rb_hash_aset(list, obj, pair_list);
+	}
+	rb_hash_aset(pair_list, paired_obj, Qtrue);
+    }
     return hash;
 }

 static void
-recursive_pop(VALUE hash, VALUE obj)
+recursive_pop(VALUE hash, VALUE obj, VALUE paired_obj)
 {
-    VALUE list, sym;
+    VALUE list, sym, pair_list, symname, thrname;

     sym = ID2SYM(rb_frame_this_func());
     if (NIL_P(hash) || TYPE(hash) != T_HASH) {
-	VALUE symname;
-	VALUE thrname;
 	symname = rb_inspect(sym);
 	thrname = rb_inspect(rb_thread_current());
-
 	rb_raise(rb_eTypeError, "invalid inspect_tbl hash for %s in %s",
 		 StringValuePtr(symname), StringValuePtr(thrname));
@@ -3366,19 +3390,34 @@ recursive_pop(VALUE hash, VALUE obj)
     list = rb_hash_aref(hash, sym);
     if (NIL_P(list) || TYPE(list) != T_HASH) {
-	VALUE symname = rb_inspect(sym);
-	VALUE thrname = rb_inspect(rb_thread_current());
+	symname = rb_inspect(sym);
+	thrname = rb_inspect(rb_thread_current());
 	rb_raise(rb_eTypeError, "invalid inspect_tbl list for %s in %s",
 		 StringValuePtr(symname), StringValuePtr(thrname));
     }
+    if (paired_obj) {
+	pair_list = rb_hash_lookup2(list, obj, Qundef);
+	if (pair_list == Qundef) {
+	    symname = rb_inspect(sym);
+	    thrname = rb_inspect(rb_thread_current());
+	    rb_raise(rb_eTypeError, "invalid inspect_tbl pair_list for %s in %s",
+		     StringValuePtr(symname), StringValuePtr(thrname));
+	}
+	if (TYPE(pair_list) == T_HASH) {
+	    rb_hash_delete(pair_list, obj);
+	    if (!RHASH_EMPTY_P(pair_list)) {
+		return; /* keep hash until is empty */
+	    }
+	}
+    }
     rb_hash_delete(list, obj);
 }

-VALUE
-rb_exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg)
+static VALUE
+exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg, VALUE pairid)
 {
     volatile VALUE hash = rb_thread_local_aref(rb_thread_current(), recursive_key);
     VALUE objid = rb_obj_id(obj);

-    if (recursive_check(hash, objid)) {
+    if (recursive_check(hash, objid, pairid)) {
 	return (*func) (obj, arg, Qtrue);
     }
@@ -3387,5 +3426,5 @@ rb_exec_recursive(VALUE (*func) (VALUE, 
 	int state;

-	hash = recursive_push(hash, objid);
+	hash = recursive_push(hash, objid, pairid);
 	PUSH_TAG();
 	if ((state = EXEC_TAG()) == 0) {
@@ -3393,5 +3432,5 @@ rb_exec_recursive(VALUE (*func) (VALUE, 
 	}
 	POP_TAG();
-	recursive_pop(hash, objid);
+	recursive_pop(hash, objid, pairid);
 	if (state)
 	    JUMP_TAG(state);
@@ -3400,4 +3439,16 @@ rb_exec_recursive(VALUE (*func) (VALUE, 
 }

+VALUE
+rb_exec_recursive(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg)
+{
+    return exec_recursive(func, obj, arg, 0);
+}
+
+VALUE
+rb_exec_recursive_paired(VALUE (*func) (VALUE, VALUE, int), VALUE obj, VALUE arg)
+{
+    return exec_recursive(func, obj, arg, rb_obj_id(arg));
+}
+
 /* tracer */

Index: include/ruby/intern.h
===================================================================
--- include/ruby/intern.h	(revision 23373)
+++ include/ruby/intern.h	(working copy)
@@ -341,4 +341,5 @@ void rb_thread_atfork(void);
 void rb_thread_atfork_before_exec(void);
 VALUE rb_exec_recursive(VALUE(*)(VALUE, VALUE, int),VALUE,VALUE);
+VALUE rb_exec_recursive_paired(VALUE(*)(VALUE, VALUE, int),VALUE,VALUE);
 /* file.c */
 VALUE rb_file_s_expand_path(int, VALUE *);


-- 
Nobu Nakada

Updated by marcandre (Marc-Andre Lafortune) about 3 years ago

Oups, completely forgot to finish the hash part.
The problem was that I was not pairing hash1 with hash2 but with the temporary struct made for the comparison. This is fixed in the included patch. I have updated rubyspecs (core/hash/equal_value_spec) with complex recursive hash tests. Specs now test recursive hashes, even when formed differently but with the same content, and arrays containing hashes containing arrays...

For the false positive: firstly, the example you show is already == in the current version. The equality for arrays is "do they contain the same elements in the same order". For recursive arrays, it's easier to think about the reverse: arrays are not equal if there is a mismatch between one element and the corresponding one. [a] and a don't have such a mismatch.

PS: Thanks for prettying up my code! I had difficulty applying the patch, I hope everything made it through.

Updated by nobu (Nobuyoshi Nakada) about 3 years ago

Hi,

At Sun, 10 May 2009 07:37:31 +0900,
Marc-Andre Lafortune wrote in [ruby-core:23412]:
> File recursion2.patch added

This patch seems nice to me.  I'm afraid matz may not like the
name rb_exec_recursive_paired().  Is it OK?

-- 
Nobu Nakada

Updated by marcandre (Marc-Andre Lafortune) almost 3 years ago

Nobuyoshi Nakada wrote:
> This patch seems nice to me.
> I'm afraid matz may not like the name rb_exec_recursive_paired().  Is it OK?

'rb_exec_recursive_paired' is the name that came to my mind because it checks recursion on a pair of objects. Maybe 'rb_exec_recursive_on_pair' or 'rb_exec_recursive_with_pair' would be even better. As long as a good name is chosen, I'll be happy :-)

Thanks

Updated by nobu (Nobuyoshi Nakada) almost 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
Applied in changeset r23557.

Also available in: Atom PDF