Feature #5607

Inconsistent reaction in Range of String

Added by Yen-Nan Lin over 2 years ago. Updated over 1 year ago.

[ruby-core:<unknown>]
Status:Assigned
Priority:Normal
Assignee:Martin Dürst
Category:-
Target version:next minor

Description

=begin
When I tried to access excel file, I found some inconsistent behavior about range of string.

ruby-1.9.3-p0 :001 > ("A".."AB").to_a
=> ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "AA", "AB"]

This behavior is as what I thought.

ruby-1.9.3-p0 :002 > ("X".."AB").to_a
=> []

However, I tried to access "X" to "AB", and its reaction is inconsistent with above example.

I hope that behavior would be consistent in future release.

Thanks!
=end


Related issues

Related to ruby-trunk - Feature #2323: "Z".."Z".succが空 Assigned 11/02/2009
Related to ruby-trunk - Bug #6258: String#succ has suprising behavior for "\u1036" (MYANMAR ... Assigned 04/05/2012

History

#1 Updated by Benoit Daloze over 2 years ago

Hi,

This is indeed surprising.
Range#to_a is calling Range#each which has a special case for Strings to call String#upto, which is said to use String#succ.

However, in rbstrupto (string.c:2995), there is a test that do not yield correspondingly to #succ mentioned in the documentation:

n = rb_str_cmp(beg, end);
if (n > 0 || (excl && n == 0)) return beg;

In your case "X" <=> "AB" returns 1, so nothing is yielded.
The assumption to yield nothing when beg > end is not producing an intuitive result in this case, because the definition of <=> is using a different comparison and so a <=> a.succ might as well be -1 or 1.

I believe this test should be changed to use a String#succ -based comparison, if this is possible.

P.S.: The documentation starts with:
Iterates through successive values, starting at str and
ending at other_str inclusive, [...]

I believe "inclusive" should be removed there, as it depends whether the exclusive option is set and is explained further.

P.S.2: I'm not sure it is right to use different methods (not only #succ) in Range#each while being undocumented. It should probably mention it uses String#upto for String and Symbol.

P.S.3: Range uses #succ and #<=>, which might not be coherent as we see in the case of String. How to solve that?

#2 Updated by Anonymous over 2 years ago

It should be forbidden to have a Class (here Range) whose instance
methods are linked by variable axiomatic relations, depending on the
actual instance. There are too many different concepts covered by the
same name Range.

Actually, as the example shows, the situation is already strange for
Strings (even before taking in account encodings, which add to the
confusion).

_md

Benoit Daloze wrote in post #1031231:

P.S.2: I'm not sure it is right to use different methods (not only
#succ) in Range#each while being undocumented. It should probably
mention it uses String#upto for String and Symbol.

P.S.3: Range uses #succ and #<=>, which might not be coherent as we see

in the case of String. How to solve that?

Feature #5607: Inconsistent reaction in Range of String
http://redmine.ruby-lang.org/issues/5607

Author: Yen-Nan Lin
Status: Open
Priority: Normal
Assignee:
Category:
Target version:

=begin
When I tried to access excel file, I found some inconsistent behavior
about range of string.

ruby-1.9.3-p0 :001 > ("A".."AB").to_a
=> ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "AA",
"AB"]

This behavior is as what I thought.

ruby-1.9.3-p0 :002 > ("X".."AB").to_a
=> []

However, I tried to access "X" to "AB", and its reaction is inconsistent
with above example.

I hope that behavior would be consistent in future release.

Thanks!
=end

--
Posted via http://www.ruby-forum.com/.

#3 Updated by Hiro Asari over 2 years ago

See #2323. In particular, in note 2, Matz acknowledges that the situation is muddled when it comes to Ranges specified by Strings.

#4 Updated by Alexey Muranov over 2 years ago

This behavior of range seems consistent with

"X"<"AB" # => false

in Ruby 1.9.3.

#5 Updated by Anonymous over 2 years ago

Yes, but if "X < AB" is false, "X" should not be between "A" and "AB".

_md

-----Message d'origine-----
De : Alexey Muranov [mailto:muranov@math.univ-toulouse.fr]
Envoyé : jeudi 10 novembre 2011 18:15
À : ruby-core@ruby-lang.org
Objet : [ruby-trunk - Feature #5607] Inconsistent reaction in Range of String

Issue #5607 has been updated by Alexey Muranov.

This behavior of range seems consistent with

"X"<"AB" # => false

in Ruby 1.9.3.


Feature #5607: Inconsistent reaction in Range of String
http://redmine.ruby-lang.org/issues/5607

Author: Yen-Nan Lin
Status: Open
Priority: Normal
Assignee:
Category:
Target version:

=begin
When I tried to access excel file, I found some inconsistent behavior about range of string.

ruby-1.9.3-p0 :001 > ("A".."AB").to_a
=> ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "AA", "AB"]

This behavior is as what I thought.

ruby-1.9.3-p0 :002 > ("X".."AB").to_a
=> []

However, I tried to access "X" to "AB", and its reaction is inconsistent with above example.

I hope that behavior would be consistent in future release.

Thanks!
=end

--
http://redmine.ruby-lang.org

#6 Updated by Alexey Muranov over 2 years ago

Anonymous wrote:

Yes, but if "X < AB" is false, "X" should not be between "A" and "AB".

_md

I agree. ("A".."AB").to_a seems inconsistent with ordering and with ("X".."AB").to_a.

It seems that the behavior of ("A".."AB").to_a is aimed at particular applications. I am against that: i think that String objects, as well as Range of String objects, are not supposed to know about traditions of naming spreadsheet columns.


Update I see only one way to make ("X".."AB").to_a return ["X", "Y", "Z", "AA", "AB"]: let it know somehow (the object or the method) that the order used is DegLex, and that the set of admissible strings only include strings in capital letters A-Z. I made my suggestions about this in discussion of Issue #5534: http://redmine.ruby-lang.org/issues/5534#change-22110 but i am not sure how practical they are at the moment.

#7 Updated by Yukihiro Matsumoto over 2 years ago

  • Status changed from Open to Feedback

Ruby classes often play several roles, for example, Array can be array, stack or queue, according to usage of methods. Range is similar. A range is a class with starti point and end point (and flag for end-exclusion). You can use it as interval or sequence of iterated objects from start to end.

In most cases (especially for numbers) those two behave same, but for strings, they behave quite differently, you have to care about how to use ranges. The following methods treat ranges as intervals:

min, max, cover?

The other methods like the following treat ranges as seqeunces:

===, each, step, member?, include?, and methods inherited from Enumerable

matz.

#8 Updated by Martin Dürst about 2 years ago

  • Status changed from Feedback to Open

Yukihiro Matsumoto wrote:

Ruby classes often play several roles, for example, Array can be array, stack or queue, according to usage of methods. Range is similar. A range is a class with starti point and end point (and flag for end-exclusion). You can use it as interval or sequence of iterated objects from start to end.

In most cases (especially for numbers) those two behave same, but for strings, they behave quite differently, you have to care about how to use ranges. The following methods treat ranges as intervals:

min, max, cover?

The other methods like the following treat ranges as seqeunces:

===, each, step, member?, include?, and methods inherited from Enumerable

This makes a lot of sense so far. But the example uses #to_a, which is inherited from Enumerable. And still it treats a range as an interval, not as a sequence.

I have reopened this. I think it should be a bug, not a feature. I would have changed this to a bug if I knew how. Or should we reopen #2323?

#9 Updated by Yui NARUSE about 2 years ago

Martin Dürst wrote:

This makes a lot of sense so far. But the example uses #to_a, which is inherited from Enumerable. And still it treats a range as an interval, not as a sequence.

I have reopened this. I think it should be a bug, not a feature. I would have changed this to a bug if I knew how. Or should we reopen #2323?

What is your plan?

#10 Updated by Martin Dürst about 2 years ago

Yui NARUSE wrote:

What is your plan?

Short version: Make it work the way Matz described it in http://bugs.ruby-lang.org/issues/5607#note-7.

I haven't yet looked at the code, but Benoit provides some good pointers. I hope to have some time to give it a try, but I don't mind if somebody else is faster than me.

#11 Updated by Akira Tanaka about 2 years ago

I presented String#succ mechanism:
http:www.a-k-r.org/pub/string-succ-rejectkaigi2008.pdf
(in Japanese)

#12 Updated by Martin Dürst about 2 years ago

We have discussed this issue at today's developers' meeting in Akihabara.

We agreed that it would be desirable to fix this, but that it may not be easy to implement. To avoid endless loops, one has to be able to check whether the start of the range will reach the end with a finite number of .succs.

I have tentatively volunteered to look at this issue and try to implement it (but I can't guarantee a result, sorry).

#13 Updated by Yusuke Endoh about 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Martin Dürst

Martin-sensei,

I tentatively assign this ticket to you.
If you give up, please set the assignee to another person,
or let make it a blank. Take it easy.

Yusuke Endoh mame@tsg.ne.jp

#14 Updated by Martin Dürst about 2 years ago

On 2012/03/28 0:10, mame (Yusuke Endoh) wrote:

Issue #5607 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to duerst (Martin Dürst)

Martin-sensei,

I tentatively assign this ticket to you.
If you give up, please set the assignee to another person,
or let make it a blank. Take it easy.

I actually said I would take it at
http://bugs.ruby-lang.org/issues/5607#note-12, and I thought that I had
assigned it to me, but apparently, I forgot.

Also, I have already made some progress on how to address this. The
pointer from Akira
(http:www.a-k-r.org/pub/string-succ-rejectkaigi2008.pdf, in Japanese)
was very helpful.

Regards, Martin.

#15 Updated by Yusuke Endoh over 1 year ago

  • Target version set to next minor

Also available in: Atom PDF