Project

General

Profile

Bug #1893

Recursive Enumerable#join is surprising

Added by bitsweat (Jeremy Daer) about 10 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 1.9.2dev (2009-08-06) [i386-darwin9.7.0]
Backport:
[ruby-core:24786]

Description

=begin

Bar = Struct.new(:a, :b)
=> Bar
bars = [Bar.new('1', '2'), Bar.new('3', '4')]
=> [#, #]
bars * '--'
=> "1--2--3--4"

Surprising? It looks like joining Arrays not Structs. Let's define to_s:

class Bar; def to_s; 'foo' end end
=> nil
bars * '--'
=> "1--2--3--4"

Doesn't work! Strange. But remove to_a and it works!

class Bar; undef_method :to_a end
=> Bar
bars * '--'
=> "foo--foo"

Also note that defining to_str works because ary_join_1 checks rb_check_string_type first, then rb_check_convert_type:

class Bar; def to_str; 'baz' end end
=> nil
bars * '--'
=> "baz--baz"

See r23951 for the change:

  • array.c (ary_join_1): recursive join for Enumerators (and objects with #to_a).

There are two solutions:

  • remove Struct#to_a, or
  • tighten the recursive join check for arrays

I think we need a to_ary (like to_str) for the recursive array case instead of using to_a.
=end


Related issues

Related to Ruby master - Feature #7226: Add Set#join method as a shortcut for to_a.joinRejected10/28/2012Actions

History

#1

Updated by bitsweat (Jeremy Daer) about 10 years ago

=begin
Is this intentional? Joining an array with struct elements splits them into arrays instead of converting to strings.
=end

#2

Updated by naruse (Yui NARUSE) almost 10 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)

=begin

=end

#3

Updated by naruse (Yui NARUSE) over 9 years ago

=begin
This is because:

  • Enumerable#join is recursive
  • If an element is String, add it
  • If an element is Array, add elem.join
  • If others, add obj.to_a.join or obj.to_s

When an element is Enumerator, it classed as "others";
["foo".chars] will be '["a", "b", "c"]' by your suggestion.
=end

#4

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi Jeremy,

Bar = Struct.new(:a, :b)
=> Bar
bars = [Bar.new('1', '2'), Bar.new('3', '4')]
=> [#, #]
bars * '--'
=> "1--2--3--4"

Surprising? It looks like joining Arrays not Structs. Let's define to_s:

class Bar; def to_s; 'foo' end end
=> nil
bars * '--'
=> "1--2--3--4"

Doesn't work! Strange.

I agree with your intuition.

There are two solutions:

  • remove Struct#to_a, or

It is too incompatible to be accepted.

I think we need a to_ary (like to_str) for the recursive array case instead of using to_a.

I think it is reasonable, and Array#flatten actually does so.

But there seems to be a reason why to_a is used. This behavior
(calling to_a) is introduced because of "recursive join for
Enumerators." (r23951)
Calling to_ary does not meet the rationale because Enumerator
has no #to_ary currently.

I guess adding Enumerator#to_ary is a right solution.

--
Yusuke ENDOH mame@tsg.ne.jp
=end

#5

Updated by bitsweat (Jeremy Daer) over 9 years ago

=begin
Hi Yusuke,

Agreed that Enumerator#to_ary would resolve this.

Plus, I think it's better behavior: recursive join should use implicit (to_ary) not explicit (to_a) coercion. We wish to join array-like objects, not convert our objects to arrays then join.
=end

#6

Updated by Eregon (Benoit Daloze) over 9 years ago

=begin
On 3 March 2010 06:42, Jeremy Kemper redmine@ruby-lang.org wrote:

Issue #1893 has been updated by Jeremy Kemper.

Hi Yusuke,

Agreed that Enumerator#to_ary would resolve this.

Plus, I think it's better behavior: recursive join should use implicit
(to_ary) not explicit (to_a) coercion. We wish to join array-like objects,

not convert our objects to arrays then join.

http://redmine.ruby-lang.org/issues/show/1893


http://redmine.ruby-lang.org

Hi,

Sure I would agree using to_ary:
http://groups.google.com/group/ruby-talk-google/browse_thread/thread/26562ef28368a582/927899a3fb65b962?lnk=gst&q=array%23join+usinf+to_a#

Thanks to have reported the issue :)

#7

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is surprising"
on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh redmine@ruby-lang.org writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                        matz.

=end

#8

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/3 Yukihiro Matsumoto matz@ruby-lang.org:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

Then, Array#join should not flatten Enumerator, I think.
Otherwise, we need the 3rd concept (than to_ary and to_a) such as
to_join.

At least, the change seems to actually cause compatibility issue
in real world. Should we fix or revert anyway?

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#9

Updated by Eregon (Benoit Daloze) over 9 years ago

=begin

| I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                   matz.

Well, I thought that long conversion methods, were supplied to behave like
what they can be converted.
I would define to_int, only to sth that can be a number and then to_ary to
Array-like objects.

Just having a look at ri to_ary:
1 Array#to_ary
2 Net::HTTPResponse#to_ary
3 Nokogiri::XML::NodeSet#to_ary
4 Rake::FileList#to_ary
5 WEBrick::HTTPUtils::FormData#to_ary

These five can really be considered as Array I think. (and if I remember,
Programming Ruby suggest clearly that long conversion methods are for this
purpose).

Then, Array#join should not flatten Enumerator, I think.
Well, that can be useful on 2+ dimensions Array ( 1],[2,3.join => "123",
but it already worked like that for 1.8. )

Sorry if I misunderstood your words,

B.D.

#10

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/3 Benoit Daloze eregontp@gmail.com:

| I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

Well, I thought that long conversion methods, were supplied to behave like
what they can be converted.

The problem is whether we can consider Enumerator as an array or not.
I think we can in most situations, but matz has said we cannot always.

I don't want to distinguish Array and Enumerator so strongly because
I consider both as just a kind of collection. But I admit that the
current Ruby design does not encourage such an idea (e.g., missing
Enumerator#to_ary).

Then, Array#join should not flatten Enumerator, I think.
Well, that can be useful on 2+ dimensions Array ( 1],[2,3.join => "123",
but it already worked like that for 1.8. )

I didn't mean Array#join should not flatten Array. It should do, of
course.

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#11

Updated by bitsweat (Jeremy Daer) over 9 years ago

=begin
On Wed, Mar 3, 2010 at 12:14 AM, Yukihiro Matsumoto matz@ruby-lang.orgwrote:

Hi,

In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is
surprising"
on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh redmine@ruby-lang.org
writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                   matz.

Argh, right. Perhaps an #as_ary, or #as_array, to indicate that the object
may coerced to an array.

#to_a is too loose for this: it says the object has an array representation.

#to_ary is too strict: it says the object may be treated as an array, even
without calling the method.

#as_ary is in between: it says the object may be treated as an array with
coercion, so call the method.

This is a similar situation with to_s/to_str and the proposed to_r/to_rat,
to_f/to_flo. I think these confusions could be resolved with as_str and
as_rat.

The fact that many classes incorrectly provide #to_ary is good evidence, too
(much like classes that incorrectly provide #to_str). From Benoit's comment:

2 Net::HTTPResponse#to_ary
3 Nokogiri::XML::NodeSet#to_ary
4 Rake::FileList#to_ary
5 WEBrick::HTTPUtils::FormData#to_ary

These would be better stated as #as_ary.

jeremy

On Wed, Mar 3, 2010 at 12:14 AM, Yukihiro Matsumoto <matz@ruby-lang.org> wrote:
Hi,

In message "Re: [ruby-core:28422] [Bug #1893] Recursive Enumerable#join is surprising"
   on Wed, 3 Mar 2010 00:57:26 +0900, Yusuke Endoh <redmine@ruby-lang.org> writes:

|I guess adding Enumerator#to_ary is a right solution.

I don't think so, supplying to_ary means that object can be considered
as an array, which is not always the case.

                                                       matz.Argh, right. Perhaps an #as_ary, or #as_array, to indicate that the object may coerced to an array.
#to_a is too loose for this: it says the object has an array representation.#to_ary is too strict: it says the object may be treated as an array, even without calling the method.
#as_ary is in between: it says the object may be treated as an array with coercion, so call the method.This is a similar situation with to_s/to_str and the proposed to_r/to_rat, to_f/to_flo. I think these confusions could be resolved with as_str and as_rat.
The fact that many classes incorrectly provide #to_ary is good evidence, too (much like classes that incorrectly provide #to_str). From Benoit's comment:
  2 Net::HTTPResponse#to_ary  3 Nokogiri::XML::NodeSet#to_ary  4 Rake::FileList#to_ary
  5 WEBrick::HTTPUtils::FormData#to_aryThese would be better stated as #as_ary.jeremy

=end

#12

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28439] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Wed, 3 Mar 2010 19:16:48 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|At least, the change seems to actually cause compatibility issue
|in real world. Should we fix or revert anyway?

We should do something:

(a) revert and remove Enumerable#join altogether, leaving Array#join.
(b) make Enumerable#join not to join recursively; I think Array#join
should remain recursive join for array elements.

Which do you guys prefer? I don't disclose my preference now, since
disclosing mine would have too much impact on discussion.

                        matz.

=end

#13

Updated by bitsweat (Jeremy Daer) over 9 years ago

=begin
On Thu, Mar 11, 2010 at 4:47 PM, Yukihiro Matsumoto matz@ruby-lang.org wrote:

Hi,

In message "Re: [ruby-core:28439] Re: [Bug #1893] Recursive Enumerable#join is  surprising"
   on Wed, 3 Mar 2010 19:16:48 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|At least, the change seems to actually cause compatibility issue
|in real world.  Should we fix or revert anyway?

We should do something:

 (a) revert and remove Enumerable#join altogether, leaving Array#join.
 (b) make Enumerable#join not to join recursively; I think Array#join
    should remain recursive join for array elements.

Which do you guys prefer?  I don't disclose my preference now, since
disclosing mine would have too much impact on discussion.

I prefer (b). Enumerable#join is a welcome new feature; let's keep it.

But, let's not break old objects which don't expect to be destructured
into arrays when they are joined.

jeremy

=end

#14

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:

We should do something:

(a) revert and remove Enumerable#join altogether, leaving Array#join.
(b) make Enumerable#join not to join recursively; I think Array#join
should remain recursive join for array elements.

Thank you for remembering the issue!
But I like: (c) revert it and leave Enumerable#join.

I think Enumerable#join is irrelevant and innocent.
Why should Array#join flatten Enumerable object if there is
Enumerator#join? Not because Enumerator is similar to Array?

If Enumerator is similar to Array, (c-1) Enumerator#to_ary
should be defined.
Otherwise, (c-2) merely Array#join should not call to_a to
flatten Enumerator.

I still prefer (c-1), though you rejected it once.

I admit (c-1) is slightly aggresive against compatibility
(for example, [] + [].to_enum now works!), so I would agree
with (a) or (b) as temporal solution.

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#15

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28622] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 12:25:00 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:
|> We should do something:
|>
|> (a) revert and remove Enumerable#join altogether, leaving Array#join.
|> (b) make Enumerable#join not to join recursively; I think Array#join
|> should remain recursive join for array elements.
|
|
|Thank you for remembering the issue!
|But I like: (c) revert it and leave Enumerable#join.

I am not sure what you mean by (c).

By the way, I should have mentioned that Array#join will not call
#to_a (but #to_ary) either we choose (a) or (b).

|If Enumerator is similar to Array, (c-1) Enumerator#to_ary
|should be defined.
|Otherwise, (c-2) merely Array#join should not call to_a to
|flatten Enumerator.
|
|I still prefer (c-1), though you rejected it once.

Implicit conversion methods such as #to_ary and #to_str should be
defined only when the object provides (almost) equivalent behavior
(i.e. method set). Enumerable and Array are not the case.

                        matz.

=end

#16

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:

In message "Re: [ruby-core:28622] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 12:25:00 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:
|> We should do something:
|>
|> (a) revert and remove Enumerable#join altogether, leaving Array#join.
|> (b) make Enumerable#join not to join recursively; I think Array#join
|> should remain recursive join for array elements.
|
|
|Thank you for remembering the issue!
|But I like: (c) revert it and leave Enumerable#join.

I am not sure what you mean by (c).

  • Array#join will not call to_a.
  • both Array#join and Enumerable#join will remain and recursively join Array (and Enumerator, if possible).

By the way, I should have mentioned that Array#join will not call
#to_a (but #to_ary) either we choose (a) or (b).

I see. Good.

|If Enumerator is similar to Array, (c-1) Enumerator#to_ary
|should be defined.
|Otherwise, (c-2) merely Array#join should not call to_a to
|flatten Enumerator.
|
|I still prefer (c-1), though you rejected it once.

Implicit conversion methods such as #to_ary and #to_str should be
defined only when the object provides (almost) equivalent behavior
(i.e. method set). Enumerable and Array are not the case.

When you say two objects are "almost equivalent", do you expect duck
typing? If they behave the same enough, we can directly call #each,
#[], #[]=, etc. without calling #to_ary.

I cannot see the reason why #to_ary is needed if the requirement for
defining #to_ary is so rigorous.
Is #to_ary just for performance? If so, I think we should handle an
array-like object as an array even if it doesn't provide #to_ary.

I admit I'm slightly radical, but it is enough for me to think Array
and Enumerator "almost equivalent" because both have one sequence of
elements with order. Random-access feature is not important for me
because "Array" is not already just an "array". It serves as a set,
list, queue, assoc list, etc.

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#17

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28625] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 18:17:16 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|When you say two objects are "almost equivalent", do you expect duck
|typing? If they behave the same enough, we can directly call #each,
|#[], #[]=, etc. without calling #to_ary.

|Is #to_ary just for performance? If so, I think we should handle an
|array-like object as an array even if it doesn't provide #to_ary.

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

                        matz.

=end

#18

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

I got it! I'm pleased with listening the rationale. Thank you.

Then, how about:

  • Array#join will flatten only Array-like object via #to_ary
  • Enumerable#join will flatten any Enumerable object via #each ?

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#19

Updated by Eregon (Benoit Daloze) over 9 years ago

=begin
On 12 March 2010 11:13, Yusuke ENDOH mame@tsg.ne.jp wrote:

Hi,

2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:

The whole purpose of to_ary, to_str etc, is to support implicit type
conversion for Array/String-like objects into real Array/String for C
defined functions which requires legit typing. By String/Array-like
object, I mean Delegators, for example. So in a sense, it's for
performance, by allowing C functions to bypass method invocation.

I got it! I'm pleased with listening the rationale. Thank you.

Then, how about:

  • Array#join will flatten only Array-like object via #to_ary
  • Enumerable#join will flatten any Enumerable object via #each ?

--
Yusuke ENDOH mame@tsg.ne.jp

+1
I already expressed my wish to do something like that upper.

So my key idea is, if an object accept #to_ary, it should be flattened as
Array for #join.

#20

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28627] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Fri, 12 Mar 2010 19:13:36 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|Then, how about:
| - Array#join will flatten only Array-like object via #to_ary
| - Enumerable#join will flatten any Enumerable object via #each
|?

In that case, the "surprise" of OP would not be fixed. I agree with
the OP (Jeremy Kemper) here. Enumerable#join should not flatten
Enumerable.

                        matz.

=end

#21

Updated by mame (Yusuke Endoh) over 9 years ago

=begin
Hi,

2010/3/12 Yukihiro Matsumoto matz@ruby-lang.org:

|Then, how about:
| - Array#join will flatten only Array-like object via #to_ary
| - Enumerable#join will flatten any Enumerable object via #each
|

In that case, the "surprise" of OP would not be fixed. I agree with
the OP (Jeremy Kemper) here. Enumerable#join should not flatten
Enumerable.

Wow! Very sorry! I thought that Struct just provided to_a. I did never
know that it included Enumerable.

It is very "surprising" itself for me, but it is traditionally the fact.
I have no idea to solve the issue satisfactorily. I leave it to you.

--
Yusuke ENDOH mame@tsg.ne.jp

=end

#22

Updated by matz (Yukihiro Matsumoto) over 9 years ago

=begin
Hi,

In message "Re: [ruby-core:28631] Re: [Bug #1893] Recursive Enumerable#join is surprising"
on Sat, 13 Mar 2010 00:03:26 +0900, Yusuke ENDOH mame@tsg.ne.jp writes:

|> In that case, the "surprise" of OP would not be fixed. I agree with
|> the OP (Jeremy Kemper) here. Enumerable#join should not flatten
|> Enumerable.

|It is very "surprising" itself for me, but it is traditionally the fact.
|I have no idea to solve the issue satisfactorily. I leave it to you.

Since Enumerable#join was introduced after 1.9.1, I admit that was a
design mistake, and I would remove the method, leaving Array#join of
course. And Array#join will call to_ary to check if any elements is
an array. In short, back to the old behavior. Any objection?

                        matz.

=end

#23

Updated by matz (Yukihiro Matsumoto) over 9 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r26906.
Jeremy, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Updated by naruse (Yui NARUSE) about 5 years ago

  • Related to Feature #7226: Add Set#join method as a shortcut for to_a.join added

Also available in: Atom PDF