Project

General

Profile

Feature #15588

String#each_chunk and #chunks

Added by Glass_saga (Masaki Matsushita) 8 months ago. Updated 25 days ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
[ruby-core:91414]

Description

String#each_chunk iterates chunks of specified size in String.
String#chunks is a shorthand for str.each_chunk(n).to_a.

present:

str = <<EOS
20190101 20190102
20190103 20190104
EOS

str.scan(/.{1,9}/m) do |chunk|
  p chunk #=> "20190101 "
end

str.scan(/.{1,9}/m) do |chunk|
  chunk.strip!
  p chunk #=> "20190101"
end

str.scan(/.{1,9}/m) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan(/.{1,9}/m).map(&:strip) #=> ["20190101", "20190102", "20190103", "20190104"]

proposal:

str = <<EOS
20190101 20190102
20190103 20190104
EOS

str.each_chunk(9) do |chunk|
  p chunk #=> "20190101 "
end

str.each_chunk(9, strip: true) do |chunk|
  p chunk #=> "20190101"
end

str.chunks(9) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.chunks(9, strip: true) #=> ["20190101", "20190102", "20190103", "20190104"]

Files

patch.diff (6.56 KB) patch.diff Glass_saga (Masaki Matsushita), 02/06/2019 01:35 AM

History

Updated by shyouhei (Shyouhei Urabe) 8 months ago

Why the String#scan example you showed is not suitable for you? Tell us what makes you happy with the proposal.

Updated by mame (Yusuke Endoh) 8 months ago

I like the proposal itself. I don't think that chunks is a good name, though.

To take every n characters, I often write str.scan(/.{1,#{ n }}/m), but it looks a bit cryptic. In this case str.chunks(n) is simpler.

I dislike strip: true. It is too ad-hoc. Does it also support lstrip: true, rstrip: true, chop: true, chomp: true, etc? In principle, one method should do one thing, IMO.

#3

Updated by sawa (Tsuyoshi Sawada) 8 months ago

I am also not so sure if this feature is needed. But if I wanted such feature, I would ask to let String#scan take similar arguments as String#[]. That is, let the first argument point to the starting position, and an optional second argument to be the length. Since we want to capture multiple matches unlike with [], passing a single index for the first argument does not make much sense, but now we have Enumerator::ArithmeticSequence. So we should be able to do

str.scan((0..).step(9)) #=> ["20190101 ", "20190102\n", "20190103 ", "20190104\n"]
str.scan((0..).step(9), 8) #=> ["20190101", "20190102", "20190103", "20190104"]

Updated by naruse (Yui NARUSE) 8 months ago

This requires more concrete real world example.

Updated by ioquatix (Samuel Williams) 3 months ago

Here is a usecase

https://github.com/socketry/protocol-http2/blob/12875a97e0f82315682191e3bbbaba8b59cb3432/lib/protocol/http2/settings_frame.rb#L236

Because I didn't know /....../ should be /....../m I wasted at least 2 hours of debugging.

I wish for both each_chunk or each_slice and/or each_unpack.

Updated by ioquatix (Samuel Williams) 3 months ago

I wonder if we should have consistency with slice and each_slice from Array. But honestly, I don't care, just if it's available.

Updated by ioquatix (Samuel Williams) 3 months ago

Is size in characters or bytes?

Updated by Glass_saga (Masaki Matsushita) about 2 months ago

I wonder if we should have consistency with slice and each_slice from Array. But honestly, I don't care, just if it's available.

I like String#each_slice and #slices.

Is size in characters or bytes?

Considering consistency with #slice , it is better to have size as characters.

Updated by Eregon (Benoit Daloze) about 2 months ago

I think String#each_slice(n_chars) would make sense, since it's like str.chars.each_slice(9) { |a| a.join }

Updated by shevegen (Robert A. Heiler) about 2 months ago

#each_slice and #slices seems fine to me as well; I think it is also a better
name than chunks.

Updated by osyo (manga osyo) about 2 months ago

I also wanted something like # each_slice.
For example, use it when you want to fix the width of the output.

puts "abcdefghijklmnopqrstuvwxyz".each_slice(5).map { |s| "#{s}<br>" }
# output:
# abcde<br>
# fghij<br>
# klmno<br>
# pqrst<br>
# uvwxy<br>
# z<br>

Is size in characters or bytes?
Considering consistency with #slice, it is better to have size as characters.

I think that there may be multiple String#each_slice_xxx likeString#each_xxx.
(e.g. Defined String#each_slice_byte , String#each_slice_char and more...
Also, I think that String#each_slice may be equivalent toString#each_slice_char.

Updated by matz (Yukihiro Matsumoto) 25 days ago

As shyouhei (Shyouhei Urabe) mentioned, we'd like to hear the real-world use-case. Extracting fixed-width records may be the purpose. I'm curious about the OP's opinion.

Matz.

Updated by usa (Usaku NAKAMURA) 25 days ago

Just an idea, this method may be useful to treat data of fixed-length record format if it accepts multi column lengths, such as

records = []
fixed_length_records_data.each_slice(7, 10, 20) do |zip, tel, name|
  records.push({zip: zip, tel: tel, name: name})
end

Also available in: Atom PDF