Project

General

Profile

Feature #6670

str.chars.last should be possible

Added by yhara (Yutaka HARA) over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Target version:
[ruby-core:45963]

Description

=begin
Since str.chars returns an Enumerator, we need explicit to_a for some operations:

str.chars.to_a.last
str.chars.to_a[1,3]

But often I forget that and write:

str.chars.last
str.chars[1,3]

Besides that, I feel it is hard to explain why to_a is needed here when I'm writing
artilcles for Ruby beginners.

Simplest way to achieve this is to make String#chars (also #lines, #bytes and #codepoints)
return an Array. Since arrays have most of the methods defined in Enumerator, this will
not be a big change. For programs like str.chars.next, you can use each_char instead.
=end

6670.pdf - 1-page presentation slide for the feature request meeting ([ruby-dev:45708]) (39.4 KB) yhara (Yutaka HARA), 06/30/2012 02:54 AM

string_bytes_to_array.patch View (27.4 KB) zzak (Zachary Scott), 11/19/2012 11:59 AM

0001-Deprecate-lines-bytes-chars-codepoints-of-IO-likes.patch View (21.9 KB) knu (Akinori MUSHA), 11/27/2012 11:56 PM


Related issues

Related to Ruby trunk - Bug #7440: IO#lines etc. should return Array Rejected 11/26/2012

Associated revisions

Revision 37838
Added by knu (Akinori MUSHA) over 4 years ago

String#{lines,chars,codepoints,bytes} now return an array.

  • string.c (rb_str_each_line, rb_str_lines): String#lines now
    returns an array instead of an enumerator. Passing a block is
    deprecated but still supported for backwards compatibility.
    Based on the patch by yhara. [Feature #6670]

  • string.c (rb_str_each_char, rb_str_chars): Ditto for
    String#chars.

  • string.c (rb_str_each_codepoint, rb_str_codepoints): Ditto for
    String#codepoints.

  • string.c (rb_str_each_byte, rb_str_bytes): Ditto for
    String#bytes.

  • NEWS: Add notes for the above changes.

Revision 37838
Added by knu (Akinori MUSHA) over 4 years ago

String#{lines,chars,codepoints,bytes} now return an array.

  • string.c (rb_str_each_line, rb_str_lines): String#lines now
    returns an array instead of an enumerator. Passing a block is
    deprecated but still supported for backwards compatibility.
    Based on the patch by yhara. [Feature #6670]

  • string.c (rb_str_each_char, rb_str_chars): Ditto for
    String#chars.

  • string.c (rb_str_each_codepoint, rb_str_codepoints): Ditto for
    String#codepoints.

  • string.c (rb_str_each_byte, rb_str_bytes): Ditto for
    String#bytes.

  • NEWS: Add notes for the above changes.

Revision 37838
Added by knu (Akinori MUSHA) over 4 years ago

String#{lines,chars,codepoints,bytes} now return an array.

  • string.c (rb_str_each_line, rb_str_lines): String#lines now
    returns an array instead of an enumerator. Passing a block is
    deprecated but still supported for backwards compatibility.
    Based on the patch by yhara. [Feature #6670]

  • string.c (rb_str_each_char, rb_str_chars): Ditto for
    String#chars.

  • string.c (rb_str_each_codepoint, rb_str_codepoints): Ditto for
    String#codepoints.

  • string.c (rb_str_each_byte, rb_str_bytes): Ditto for
    String#bytes.

  • NEWS: Add notes for the above changes.

Revision 37838
Added by knu (Akinori MUSHA) over 4 years ago

String#{lines,chars,codepoints,bytes} now return an array.

  • string.c (rb_str_each_line, rb_str_lines): String#lines now
    returns an array instead of an enumerator. Passing a block is
    deprecated but still supported for backwards compatibility.
    Based on the patch by yhara. [Feature #6670]

  • string.c (rb_str_each_char, rb_str_chars): Ditto for
    String#chars.

  • string.c (rb_str_each_codepoint, rb_str_codepoints): Ditto for
    String#codepoints.

  • string.c (rb_str_each_byte, rb_str_bytes): Ditto for
    String#bytes.

  • NEWS: Add notes for the above changes.

Revision 38563
Added by knu (Akinori MUSHA) over 4 years ago

Deprecate #{lines,bytes,chars,codepoints} of IO-likes.

  • io.c (rb_io_lines, rb_io_bytes, rb_io_chars, rb_io_codepoints):
    Deprecate IO#{lines,bytes,chars,codepoints} and those of ARGF.
    [Feature #6670]

  • ext/stringio/stringio.c (strio_lines, strio_bytes, strio_chars)
    (strio_codepoints): Deprecate
    StringIO#{lines,bytes,chars,codepoints}. [Feature #6670]

  • ext/zlib/zlib.c (rb_gzreader_lines, rb_gzreader_bytes):
    Deprecate Zlib::GzipReader#{lines,bytes}. [Feature #6670]

Revision 38563
Added by knu (Akinori MUSHA) over 4 years ago

Deprecate #{lines,bytes,chars,codepoints} of IO-likes.

  • io.c (rb_io_lines, rb_io_bytes, rb_io_chars, rb_io_codepoints):
    Deprecate IO#{lines,bytes,chars,codepoints} and those of ARGF.
    [Feature #6670]

  • ext/stringio/stringio.c (strio_lines, strio_bytes, strio_chars)
    (strio_codepoints): Deprecate
    StringIO#{lines,bytes,chars,codepoints}. [Feature #6670]

  • ext/zlib/zlib.c (rb_gzreader_lines, rb_gzreader_bytes):
    Deprecate Zlib::GzipReader#{lines,bytes}. [Feature #6670]

Revision 38563
Added by knu (Akinori MUSHA) over 4 years ago

Deprecate #{lines,bytes,chars,codepoints} of IO-likes.

  • io.c (rb_io_lines, rb_io_bytes, rb_io_chars, rb_io_codepoints):
    Deprecate IO#{lines,bytes,chars,codepoints} and those of ARGF.
    [Feature #6670]

  • ext/stringio/stringio.c (strio_lines, strio_bytes, strio_chars)
    (strio_codepoints): Deprecate
    StringIO#{lines,bytes,chars,codepoints}. [Feature #6670]

  • ext/zlib/zlib.c (rb_gzreader_lines, rb_gzreader_bytes):
    Deprecate Zlib::GzipReader#{lines,bytes}. [Feature #6670]

Revision 38563
Added by knu (Akinori MUSHA) over 4 years ago

Deprecate #{lines,bytes,chars,codepoints} of IO-likes.

  • io.c (rb_io_lines, rb_io_bytes, rb_io_chars, rb_io_codepoints):
    Deprecate IO#{lines,bytes,chars,codepoints} and those of ARGF.
    [Feature #6670]

  • ext/stringio/stringio.c (strio_lines, strio_bytes, strio_chars)
    (strio_codepoints): Deprecate
    StringIO#{lines,bytes,chars,codepoints}. [Feature #6670]

  • ext/zlib/zlib.c (rb_gzreader_lines, rb_gzreader_bytes):
    Deprecate Zlib::GzipReader#{lines,bytes}. [Feature #6670]

History

#1 [ruby-core:45964] Updated by yhara (Yutaka HARA) over 4 years ago

Adding presentation slide for the feature request meeting ()

#2 [ruby-core:46061] Updated by mame (Yusuke Endoh) over 4 years ago

  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)

Received. Thank you!

You really want just #last?

--
Yusuke Endoh mame@tsg.ne.jp

#3 [ruby-core:46077] Updated by trans (Thomas Sawyer) over 4 years ago

While I realize it doesn't exactly fit with the whole iteration thing particularly well, I nonetheless do not think it's unreasonable for Enumerator to support something like #last(n=1). In most cases that just means it has to iterate on down to the end and deliver the result. In some cases it might be able to optimize, say if the enumerable has a fixed size.

This issue might also have some relation to the issue about Enumerable#size, #6636.

#4 [ruby-core:46097] Updated by marcandre (Marc-Andre Lafortune) over 4 years ago

Hi,

trans (Thomas Sawyer) wrote:

This issue might also have some relation to the issue about Enumerable#size, #6636.

Not directly. Being able to calculate lazily a size does not mean it is easy to have random access, and the api I propose would be quite a bit more complicated if it was to allow for random access.

I feel that the lack of #last is a good thing. It's there to remind you that it might be expensive, and that (for example) it would be worth storing in a local var instead of calling it twice...

#5 [ruby-core:46728] Updated by mame (Yusuke Endoh) over 4 years ago

  • Assignee changed from matz (Yukihiro Matsumoto) to yhara (Yutaka HARA)

Yutaka Hara,

I'm happy to inform you that matz has accepted your proposal,
as making #chars, #lines, etc. return an Array (and keep
#each_char, etc as is).

"foo".chars #=> ["f", "o", "o"]
"foo".each_char #=> #

Yhara-san, could you create a patch?

--
Yusuke Endoh mame@tsg.ne.jp

#6 [ruby-core:46960] Updated by yhara (Yutaka HARA) over 4 years ago

I'm happy to inform you that matz has accepted your proposal,
as making #chars, #lines, etc. return an Array (and keep
#each_char, etc as is).

Thank you. I'm happy too :-)

I will make a patch in this weekend.

#7 [ruby-core:47212] Updated by yhara (Yutaka HARA) over 4 years ago

=begin
Hello,

I wrote a patch for String#lines, #chars, #bytes and #codepoints.

Seemingly the following methods also need to be fixed (right?)

  • IO#lines, chars, bytes, codepoints
  • StringIO#lines, chars, bytes, codepoints
  • ARGF.lines, chars, bytes
    • Is it intentional that ARGF.codepoints is missing? =end

#8 [ruby-core:48381] Updated by ko1 (Koichi Sasada) over 4 years ago

ping.
status?

#9 [ruby-core:48540] Updated by knu (Akinori MUSHA) over 4 years ago

If #lines is to become no longer an alias for #each_line, there should be no point in supporting lines {} any more. It should emit a deprecation warning and get unsupported in the future.

#10 [ruby-core:49551] Updated by zzak (Zachary Scott) over 4 years ago

I've added Yutaka-san's patch from github, please continue discussion here.

mame, could you please review this?

#11 [ruby-core:49601] Updated by mame (Yusuke Endoh) over 4 years ago

  • Assignee changed from mame (Yusuke Endoh) to knu (Akinori MUSHA)

knu, could you please review this? :-)

--
Yusuke Endoh mame@tsg.ne.jp

#12 [ruby-core:49970] Updated by mame (Yusuke Endoh) over 4 years ago

  • Assignee changed from knu (Akinori MUSHA) to nagachika (Tomoyuki Chikanaga)

nagachika-san, could you please review this?

--
Yusuke Endoh mame@tsg.ne.jp

#13 [ruby-core:50035] Updated by knu (Akinori MUSHA) over 4 years ago

Sorry, my mail filter was so buggy I failed to be notified.

I've reviewed and already given a comment above, and another on Twitter that yieldp can be dropped in favor of NIL_P(ary), without any of them responded to.
Should I revise the patch myself?

#14 [ruby-core:50039] Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago

  • Assignee changed from nagachika (Tomoyuki Chikanaga) to knu (Akinori MUSHA)

Hello,
Sorry for late reply.

OK, I assign this ticket to knu san again.
Anyway I'm willing to review the patch :)

BTW, I cannot apply string_bytes_to_array.patch on trunk r37835.

#15 [ruby-core:50040] Updated by mame (Yusuke Endoh) over 4 years ago

Thanks!

Should I revise the patch myself?

Could you do it, please?
If you are not willing, please pass the ball to nagachika-san.

--
Yusuke Endoh mame@tsg.ne.jp

#16 Updated by knu (Akinori MUSHA) over 4 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r37838.
Yutaka, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


String#{lines,chars,codepoints,bytes} now return an array.

  • string.c (rb_str_each_line, rb_str_lines): String#lines now
    returns an array instead of an enumerator. Passing a block is
    deprecated but still supported for backwards compatibility.
    Based on the patch by yhara. [Feature #6670]

  • string.c (rb_str_each_char, rb_str_chars): Ditto for
    String#chars.

  • string.c (rb_str_each_codepoint, rb_str_codepoints): Ditto for
    String#codepoints.

  • string.c (rb_str_each_byte, rb_str_bytes): Ditto for
    String#bytes.

  • NEWS: Add notes for the above changes.

#17 [ruby-core:50082] Updated by brixen (Brian Shirai) over 4 years ago

What about IO, StringIO, ARGF as mentioned above?

Thanks,
Brian

#18 [ruby-core:50091] Updated by drbrain (Eric Hodel) over 4 years ago

=begin
An IO may be infinite (({open "/dev/zero" do |io| io.chars.to_a }})), and so may ARGF ((%ruby -e 'ARGF.chars.to_a' /dev/zero%)).

It doesn't make sense to add it to StringIO, but you make work around the omission through StringIO#string.
=end

#19 [ruby-core:50179] Updated by naruse (Yui NARUSE) over 4 years ago

  • Assignee changed from knu (Akinori MUSHA) to matz (Yukihiro Matsumoto)
  • Status changed from Closed to Assigned

Now we have Enumerable#size, so we can know whether the last character is available or not by it.
So how about changing String#chars to not Array but Enumerator with size and define last method?

#20 [ruby-core:50184] Updated by marcandre (Marc-Andre Lafortune) over 4 years ago

#size will return nil for all enumerators based on IO.

Maybe the best is to have chars return an array for strings and deprecate it with a warning for IO.

#21 [ruby-core:50192] Updated by mame (Yusuke Endoh) over 4 years ago

  • Assignee changed from matz (Yukihiro Matsumoto) to yhara (Yutaka HARA)

Okay, I understand that it may be harmful to change the behavior right now.
Then, let's warn a user to use `each_line' instead in 2.0.0, and change it in future.

Yhara-san, could you create the following type of patch for all methods?

diff --git a/io.c b/io.c
index bacc6fc..26d3970 100644
--- a/io.c
+++ b/io.c
@@ -10795,6 +10795,13 @@ argf_each_line(int argc, VALUE *argv, VALUE argf)
}
}

+static VALUE
+argf_lines(int argc, VALUE argv, VALUE argf)
+{
+ rb_warn("ARGF#lines will return an Array in future; you should use `each_line' instead");
+ return argf_each_line(argc, argv, argf);
+}
+
/

* call-seq:
* ARGF.bytes {|byte| block } -> ARGF
@@ -11557,7 +11564,7 @@ Init_IO(void)
rb_define_method(rb_cARGF, "each_line", argf_each_line, -1);
rb_define_method(rb_cARGF, "each_byte", argf_each_byte, 0);
rb_define_method(rb_cARGF, "each_char", argf_each_char, 0);
- rb_define_method(rb_cARGF, "lines", argf_each_line, -1);
+ rb_define_method(rb_cARGF, "lines", argf_lines, -1);
rb_define_method(rb_cARGF, "bytes", argf_each_byte, 0);
rb_define_method(rb_cARGF, "chars", argf_each_char, 0);

--
Yusuke Endoh mame@tsg.ne.jp

#22 [ruby-core:50219] Updated by knu (Akinori MUSHA) over 4 years ago

It may be an idea to simply obsolete IO#{lines,chars,codepoints,bytes} for now.

First of all, line wise operation is known to be popular and we've already got #readlines since a long time ago.
So, there is no need for a new IO#lines that returns an array unlike String.

For chars, codepoints and bytes, there is no known need for turning a whole file or stream into an on-memory array because we haven't offered such methods to date.
That may be because character or byte wise operation on a stream is typically done using a seek pointer as it reads lazily.
If this should be the case, we don't have an urgent need for a new IO#chars, #codepoints or #bytes.
However, we, including matz, confirmed in this issue that a method that does not return an array having a name in the plural form is not intuitive because most Enumerator methods have a name consisting of each_ + singular noun, and lines etc. are breaking that convention.

So, what about simply obsoleting IO#{lines,chars,codepoints,bytes} in the current shape?

We can re-add them later in another shape perhaps after we introduce an indexable enumerator (lazy array) that would satisfy everyone who might take both performance and intuitiveness seriously.

#23 [ruby-core:50222] Updated by trans (Thomas Sawyer) over 4 years ago

If I understand correctly, this is going to break a lot of code?

#24 [ruby-core:50228] Updated by knu (Akinori MUSHA) over 4 years ago

Here's a patch to deprecate #lines, #bytes, #chars and #codepoints of IO-likes.

#25 [ruby-core:50231] Updated by yhara (Yutaka HARA) over 4 years ago

trans (Thomas Sawyer) wrote:

If I understand correctly, this is going to break a lot of code?

For String, the impact will be limited.

  • String#lines returns Array, which has most of the methods defined in Enumerator.

  • Exceptions are #next, #peek, #with_index, etc.
    If you have a code like str.lines.with_index', you need to change it tostr.each_line.with_index'.

  • When you have a huge string, str.lines' will become slower and consume more memory.
    I think this is a rare case because we usually avoid
    File.read(path_to_huge_file)'.
    Even when you really need huge_str.lines to return Enumerator, you can use huge_str.each_line instead.

For IO/StringIO/ARGF/GzipReader, I'd like to +1 for showing deprecation warning in 2.0.0.

#26 [ruby-core:50238] Updated by duerst (Martin Dürst) over 4 years ago

Instead of this proposal, what about adding some/most/all of the Array methods to Enumerator?

E.g. like so:

module Enumerator
def
to_a[pos]
end
end

Of course, this is just the simplest case of [], and the simplest (and maybe slowest) implementation. This is just an idea, but I think it would be way better than having to distinguish lines/each_line, chars/each_char,... and so on.

#27 [ruby-core:50241] Updated by yhara (Yutaka HARA) over 4 years ago

duerst (Martin Dürst) wrote:

Instead of this proposal, what about adding some/most/all of the Array methods to Enumerator?

It is not easy to define behavior of Enumerator#[].
For example:

File.open(path){|f|
ls = f.lines
p lines[0] #=> Prints first line
p lines[0] #=> Prints nil, if #[] is equivalent to #to_a[pos]
}

#28 [ruby-core:50298] Updated by duerst (Martin Dürst) over 4 years ago

yhara (Yutaka HARA) wrote:

For String, the impact will be limited.

  • String#lines returns Array, which has most of the methods defined in Enumerator.

  • Exceptions are #next, #peek, #with_index, etc.
    If you have a code like str.lines.with_index', you need to change it tostr.each_line.with_index'.

For with_index in particular, wouldn't it make sense to either add it to Enumerable or deprecate it on Enumerator? That would eliminate one more difference.

#29 [ruby-core:50305] Updated by knu (Akinori MUSHA) over 4 years ago

  • Assignee changed from yhara (Yutaka HARA) to matz (Yukihiro Matsumoto)

We don't have much time left before 2.0 to decide how to change IO#lines, #chars, etc. .
Can we deprecate them for now as a first step as proposed above?

#30 Updated by yhara (Yutaka HARA) over 4 years ago

knu (Akinori MUSHA) wrote:

Here's a patch to deprecate #lines, #bytes, #chars and #codepoints of IO-likes.

Maybe we can change StringIO#lines to return Array now because it does not have problems like IO and ARGF.
(problems = return value of IO#lines may be huge or even infinite)

#31 [ruby-core:50399] Updated by drbrain (Eric Hodel) over 4 years ago

I often use StringIO as an IO substitute for tests. I would prefer matching behavior.

You can get the last line by stringio.string.lines.last. Is this acceptable?

#32 Updated by yhara (Yutaka HARA) over 4 years ago

duerst (Martin Dürst) wrote:

For with_index in particular, wouldn't it make sense to either add it to Enumerable or deprecate it on Enumerator? That would eliminate one more difference.

If we add Enumerable#with_index, we could write str.with_index. This does not make sense becuase String#each is not defined.

drbrain (Eric Hodel) wrote:

I often use StringIO as an IO substitute for tests. I would prefer matching behavior.

You can get the last line by stringio.string.lines.last. Is this acceptable?

Yes. And I got a reply from @knu: https://twitter.com/knu/status/274442544518156288
He said "We should not fix API of StringIO before fixing that of IO" and I agreed.

#33 Updated by knu (Akinori MUSHA) over 4 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r38563.
Yutaka, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


Deprecate #{lines,bytes,chars,codepoints} of IO-likes.

  • io.c (rb_io_lines, rb_io_bytes, rb_io_chars, rb_io_codepoints):
    Deprecate IO#{lines,bytes,chars,codepoints} and those of ARGF.
    [Feature #6670]

  • ext/stringio/stringio.c (strio_lines, strio_bytes, strio_chars)
    (strio_codepoints): Deprecate
    StringIO#{lines,bytes,chars,codepoints}. [Feature #6670]

  • ext/zlib/zlib.c (rb_gzreader_lines, rb_gzreader_bytes):
    Deprecate Zlib::GzipReader#{lines,bytes}. [Feature #6670]

Also available in: Atom PDF