Project

General

Profile

Actions

Bug #1893

closed

Recursive Enumerable#join is surprising

Added by bitsweat (Jeremy Daer) over 15 years ago. Updated over 13 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 1.9.2dev (2009-08-06) [i386-darwin9.7.0]
Backport:
[ruby-core:24786]

Description

=begin

Bar = Struct.new(:a, :b)
=> Bar
bars = [Bar.new('1', '2'), Bar.new('3', '4')]
=> [#<struct Bar a="1", b="2">, #<struct Bar a="3", b="4">]
bars * '--'
=> "1--2--3--4"

Surprising? It looks like joining Arrays not Structs. Let's define to_s:

class Bar; def to_s; 'foo' end end
=> nil
bars * '--'
=> "1--2--3--4"

Doesn't work! Strange. But remove to_a and it works!

class Bar; undef_method :to_a end
=> Bar
bars * '--'
=> "foo--foo"

Also note that defining to_str works because ary_join_1 checks rb_check_string_type first, then rb_check_convert_type:

class Bar; def to_str; 'baz' end end
=> nil
bars * '--'
=> "baz--baz"

See r23951 for the change:

  • array.c (ary_join_1): recursive join for Enumerators (and objects with #to_a).

There are two solutions:

  • remove Struct#to_a, or
  • tighten the recursive join check for arrays

I think we need a to_ary (like to_str) for the recursive array case instead of using to_a.
=end


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #7226: Add Set#join method as a shortcut for to_a.joinRejectedknu (Akinori MUSHA)10/28/2012Actions
Actions #1

Updated by bitsweat (Jeremy Daer) over 15 years ago

=begin
Is this intentional? Joining an array with struct elements splits them into arrays instead of converting to strings.
=end

Actions #2

Updated by naruse (Yui NARUSE) over 15 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)

=begin

=end

Actions #3

Updated by naruse (Yui NARUSE) almost 15 years ago

=begin
This is because:

  • Enumerable#join is recursive
  • If an element is String, add it
  • If an element is Array, add elem.join
  • If others, add obj.to_a.join or obj.to_s

When an element is Enumerator, it classed as "others";
["foo".chars] will be '["a", "b", "c"]' by your suggestion.
=end

Actions #4

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi Jeremy,

Bar = Struct.new(:a, :b)
=> Bar
bars = [Bar.new('1', '2'), Bar.new('3', '4')]
=> [#<struct Bar a="1", b="2">, #<struct Bar a="3", b="4">]
bars * '--'
=> "1--2--3--4"

Surprising? It looks like joining Arrays not Structs. Let's define to_s:

class Bar; def to_s; 'foo' end end
=> nil
bars * '--'
=> "1--2--3--4"

Doesn't work! Strange.

I agree with your intuition.

There are two solutions:

  • remove Struct#to_a, or

It is too incompatible to be accepted.

I think we need a to_ary (like to_str) for the recursive array case instead of using to_a.

I think it is reasonable, and Array#flatten actually does so.

But there seems to be a reason why to_a is used. This behavior
(calling to_a) is introduced because of "recursive join for
Enumerators." (r23951)
Calling to_ary does not meet the rationale because Enumerator
has no #to_ary currently.

I guess adding Enumerator#to_ary is a right solution.

--
Yusuke ENDOH
=end

Actions #5

Updated by bitsweat (Jeremy Daer) almost 15 years ago

=begin
Hi Yusuke,

Agreed that Enumerator#to_ary would resolve this.

Plus, I think it's better behavior: recursive join should use implicit (to_ary) not explicit (to_a) coercion. We wish to join array-like objects, not convert our objects to arrays then join.
=end

Actions #6

Updated by Eregon (Benoit Daloze) almost 15 years ago

=begin
On 3 March 2010 06:42, Jeremy Kemper wrote:

Issue #1893 has been updated by Jeremy Kemper.

Hi Yusuke,

Agreed that Enumerator#to_ary would resolve this.

Plus, I think it's better behavior: recursive join should use implicit
(to_ary) not explicit (to_a) coercion. We wish to join array-like objects,
not convert our objects to arrays then join.

http://redmine.ruby-lang.org/issues/show/1893


http://redmine.ruby-lang.org

Hi,

Sure I would agree using to_ary:
http://groups.google.com/group/ruby-talk-google/browse_thread/thread/26562ef28368a582/927899a3fb65b962?lnk=gst&q=array%23join+usinf+to_a#

Thanks to have reported the issue :)

Actions #7

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is surprising"
on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

						matz.

=end

Actions #8

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/3 Yukihiro Matsumoto :

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

Then, Array#join should not flatten Enumerator, I think.
Otherwise, we need the 3rd concept (than to_ary and to_a) such as
to_join.

At least, the change seems to actually cause compatibility issue
in real world. Should we fix or revert anyway?

--
Yusuke ENDOH

=end

Actions #9

Updated by Eregon (Benoit Daloze) almost 15 years ago

=begin

| I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                   matz.

Well, I thought that long conversion methods, were supplied to behave like
what they can be converted.
I would define to_int, only to sth that can be a number and then to_ary to
Array-like objects.

Just having a look at ri to_ary:
1 Array#to_ary
2 Net::HTTPResponse#to_ary
3 Nokogiri::XML::NodeSet#to_ary
4 Rake::FileList#to_ary
5 WEBrick::HTTPUtils::FormData#to_ary

These five can really be considered as Array I think. (and if I remember,
Programming Ruby suggest clearly that long conversion methods are for this
purpose).

Then, Array#join should not flatten Enumerator, I think.
Well, that can be useful on 2+ dimensions Array ( [[1],[2,3]].join => "123",
but it already worked like that for 1.8. )

Sorry if I misunderstood your words,

B.D.

Actions #10

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/3 Benoit Daloze :

| I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

Well, I thought that long conversion methods, were supplied to behave like
what they can be converted.

The problem is whether we can consider Enumerator as an array or not.
I think we can in most situations, but matz has said we cannot always.

I don't want to distinguish Array and Enumerator so strongly because
I consider both as just a kind of collection. But I admit that the
current Ruby design does not encourage such an idea (e.g., missing
Enumerator#to_ary).

Then, Array#join should not flatten Enumerator, I think.
Well, that can be useful on 2+ dimensions Array ( [[1],[2,3]].join => "123",
but it already worked like that for 1.8. )

I didn't mean Array#join should not flatten Array. It should do, of
course.

--
Yusuke ENDOH

=end

Actions #11

Updated by bitsweat (Jeremy Daer) almost 15 years ago

=begin
On Wed, Mar 3, 2010 at 12:14 AM, Yukihiro Matsumoto wrote:

Hi,

In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is
surprising"
on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh
writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                   matz.

Argh, right. Perhaps an #as_ary, or #as_array, to indicate that the object
may coerced to an array.

#to_a is too loose for this: it says the object has an array representation.

#to_ary is too strict: it says the object may be treated as an array, even
without calling the method.

#as_ary is in between: it says the object may be treated as an array with
coercion, so call the method.

This is a similar situation with to_s/to_str and the proposed to_r/to_rat,
to_f/to_flo. I think these confusions could be resolved with as_str and
as_rat.

The fact that many classes incorrectly provide #to_ary is good evidence, too
(much like classes that incorrectly provide #to_str). From Benoit's comment:

2 Net::HTTPResponse#to_ary
3 Nokogiri::XML::NodeSet#to_ary
4 Rake::FileList#to_ary
5 WEBrick::HTTPUtils::FormData#to_ary

These would be better stated as #as_ary.

jeremy



On Wed, Mar 3, 2010 at 12:14 AM, Yukihiro Matsumoto <> wrote:

Hi,



In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is surprising"
   on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh <> writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                       matz.

Argh, right. Perhaps an #as_ary, or #as_array, to indicate that the object may coerced to an array.

#to_a is too loose for this: it says the object has an array representation.

#to_ary is too strict: it says the object may be treated as an array, even without calling the method.

#as_ary is in between: it says the object may be treated as an array with coercion, so call the method.

This is a similar situation with to_s/to_str and the proposed to_r/to_rat, to_f/to_flo. I think these confusions could be resolved with as_str and as_rat.

The fact that many classes incorrectly provide #to_ary is good evidence, too (much like classes that incorrectly provide #to_str). From Benoit's comment:

  2 Net::HTTPResponse#to_ary
  3 Nokogiri::XML::NodeSet#to_ary
  4 Rake::FileList#to_ary
  5 WEBrick::HTTPUtils::FormData#to_ary

These would be better stated as #as_ary.

jeremy

=end

Actions #12

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28439] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Wed, 3 Mar 2010 19:16:48 +0900, Yusuke ENDOH writes:

|At least, the change seems to actually cause compatibility issue
|in real world. Should we fix or revert anyway?

We should do something:

(a) revert and remove Enumerable#join altogether, leaving Array#join.
(b) make Enumerable#join not to join recursively; I think Array#join
should remain recursive join for array elements.

Which do you guys prefer? I don't disclose my preference now, since
disclosing mine would have too much impact on discussion.

						matz.

=end

Actions #13

Updated by bitsweat (Jeremy Daer) almost 15 years ago

=begin
On Thu, Mar 11, 2010 at 4:47 PM, Yukihiro Matsumoto wrote:

Hi,

In message "Re: [ruby-core:28439] Re: [Bug #1893] Recursive Enumerable#join is  surprising"
   on Wed, 3 Mar 2010 19:16:48 +0900, Yusuke ENDOH writes:

|At least, the change seems to actually cause compatibility issue
|in real world.  Should we fix or revert anyway?

We should do something:

 (a) revert and remove Enumerable#join altogether, leaving Array#join.
 (b) make Enumerable#join not to join recursively; I think Array#join
    should remain recursive join for array elements.

Which do you guys prefer?  I don't disclose my preference now, since
disclosing mine would have too much impact on discussion.

I prefer (b). Enumerable#join is a welcome new feature; let's keep it.

But, let's not break old objects which don't expect to be destructured
into arrays when they are joined.

jeremy

=end

Actions #14

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto :

We should do something:

(a) revert and remove Enumerable#join altogether, leaving Array#join.
(b) make Enumerable#join not to join recursively; I think Array#join
should remain recursive join for array elements.

Thank you for remembering the issue!
But I like: (c) revert it and leave Enumerable#join.

I think Enumerable#join is irrelevant and innocent.
Why should Array#join flatten Enumerable object if there is
Enumerator#join? Not because Enumerator is similar to Array?

If Enumerator is similar to Array, (c-1) Enumerator#to_ary
should be defined.
Otherwise, (c-2) merely Array#join should not call to_a to
flatten Enumerator.

I still prefer (c-1), though you rejected it once.

I admit (c-1) is slightly aggresive against compatibility
(for example, [] + [].to_enum now works!), so I would agree
with (a) or (b) as temporal solution.

--
Yusuke ENDOH

=end

Actions #15

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28622] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 12:25:00 +0900, Yusuke ENDOH writes:

|2010/3/12 Yukihiro Matsumoto :
|> We should do something:
|>
|> (a) revert and remove Enumerable#join altogether, leaving Array#join.
|> (b) make Enumerable#join not to join recursively; I think Array#join
|> should remain recursive join for array elements.
|
|
|Thank you for remembering the issue!
|But I like: (c) revert it and leave Enumerable#join.

I am not sure what you mean by (c).

By the way, I should have mentioned that Array#join will not call
#to_a (but #to_ary) either we choose (a) or (b).

|If Enumerator is similar to Array, (c-1) Enumerator#to_ary
|should be defined.
|Otherwise, (c-2) merely Array#join should not call to_a to
|flatten Enumerator.
|
|I still prefer (c-1), though you rejected it once.

Implicit conversion methods such as #to_ary and #to_str should be
defined only when the object provides (almost) equivalent behavior
(i.e. method set). Enumerable and Array are not the case.

						matz.

=end

Actions #16

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto :

In message "Re: [ruby-core:28622] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 12:25:00 +0900, Yusuke ENDOH writes:

|2010/3/12 Yukihiro Matsumoto :
|> We should do something:
|>
|> (a) revert and remove Enumerable#join altogether, leaving Array#join.
|> (b) make Enumerable#join not to join recursively; I think Array#join
|> should remain recursive join for array elements.
|
|
|Thank you for remembering the issue!
|But I like: (c) revert it and leave Enumerable#join.

I am not sure what you mean by (c).

  • Array#join will not call to_a.
  • both Array#join and Enumerable#join will remain and recursively
    join Array (and Enumerator, if possible).

By the way, I should have mentioned that Array#join will not call
#to_a (but #to_ary) either we choose (a) or (b).

I see. Good.

|If Enumerator is similar to Array, (c-1) Enumerator#to_ary
|should be defined.
|Otherwise, (c-2) merely Array#join should not call to_a to
|flatten Enumerator.
|
|I still prefer (c-1), though you rejected it once.

Implicit conversion methods such as #to_ary and #to_str should be
defined only when the object provides (almost) equivalent behavior
(i.e. method set). Enumerable and Array are not the case.

When you say two objects are "almost equivalent", do you expect duck
typing? If they behave the same enough, we can directly call #each,
#[], #[]=, etc. without calling #to_ary.

I cannot see the reason why #to_ary is needed if the requirement for
defining #to_ary is so rigorous.
Is #to_ary just for performance? If so, I think we should handle an
array-like object as an array even if it doesn't provide #to_ary.

I admit I'm slightly radical, but it is enough for me to think Array
and Enumerator "almost equivalent" because both have one sequence of
elements with order. Random-access feature is not important for me
because "Array" is not already just an "array". It serves as a set,
list, queue, assoc list, etc.

--
Yusuke ENDOH

=end

Actions #17

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28625] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 18:17:16 +0900, Yusuke ENDOH writes:

|When you say two objects are "almost equivalent", do you expect duck
|typing? If they behave the same enough, we can directly call #each,
|#[], #[]=, etc. without calling #to_ary.

|Is #to_ary just for performance? If so, I think we should handle an
|array-like object as an array even if it doesn't provide #to_ary.

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

						matz.

=end

Actions #18

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto :

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

I got it! I'm pleased with listening the rationale. Thank you.

Then, how about:

  • Array#join will flatten only Array-like object via #to_ary
  • Enumerable#join will flatten any Enumerable object via #each
    ?

--
Yusuke ENDOH

=end

Actions #19

Updated by Eregon (Benoit Daloze) almost 15 years ago

=begin
On 12 March 2010 11:13, Yusuke ENDOH wrote:

Hi,

2010/3/12 Yukihiro Matsumoto :

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

I got it! I'm pleased with listening the rationale. Thank you.

Then, how about:

  • Array#join will flatten only Array-like object via #to_ary
  • Enumerable#join will flatten any Enumerable object via #each
    ?

--
Yusuke ENDOH

+1
I already expressed my wish to do something like that upper.

So my key idea is, if an object accept #to_ary, it should be flattened as
Array for #join.

Actions #20

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28627] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 19:13:36 +0900, Yusuke ENDOH writes:

|Then, how about:
| - Array#join will flatten only Array-like object via #to_ary
| - Enumerable#join will flatten any Enumerable object via #each
|?

In that case, the "surprise" of OP would not be fixed. I agree with
the OP (Jeremy Kemper) here. Enumerable#join should not flatten
Enumerable.

						matz.

=end

Actions #21

Updated by mame (Yusuke Endoh) almost 15 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto :

|Then, how about:
| - Array#join will flatten only Array-like object via #to_ary
| - Enumerable#join will flatten any Enumerable object via #each
|

In that case, the "surprise" of OP would not be fixed. I agree with
the OP (Jeremy Kemper) here. Enumerable#join should not flatten
Enumerable.

Wow! Very sorry! I thought that Struct just provided to_a. I did never
know that it included Enumerable.

It is very "surprising" itself for me, but it is traditionally the fact.
I have no idea to solve the issue satisfactorily. I leave it to you.

--
Yusuke ENDOH

=end

Actions #22

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

=begin
Hi,

In message "Re: [ruby-core:28631] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Sat, 13 Mar 2010 00:03:26 +0900, Yusuke ENDOH writes:

|> In that case, the "surprise" of OP would not be fixed. I agree with
|> the OP (Jeremy Kemper) here. Enumerable#join should not flatten
|> Enumerable.

|It is very "surprising" itself for me, but it is traditionally the fact.
|I have no idea to solve the issue satisfactorily. I leave it to you.

Since Enumerable#join was introduced after 1.9.1, I admit that was a
design mistake, and I would remove the method, leaving Array#join of
course. And Array#join will call to_ary to check if any elements is
an array. In short, back to the old behavior. Any objection?

						matz.

=end

Actions #23

Updated by matz (Yukihiro Matsumoto) almost 15 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r26906.
Jeremy, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Updated by naruse (Yui NARUSE) over 10 years ago

  • Related to Feature #7226: Add Set#join method as a shortcut for to_a.join added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0