Feature #20576: Add MatchData#bytebegin and MatchData#byteend - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #20576

closed

Add MatchData#bytebegin and MatchData#byteend

Feature #20576: Add MatchData#bytebegin and MatchData#byteend

Added by shugo (Shugo Maeda) almost 2 years ago. Updated almost 2 years ago.

Status:

Closed

Assignee:

Target version:

3.4

[ruby-core:118299]

Description

I'd like to propose MatchData#bytebegin and MatchData#byteend.
These methods are similar to MatchData#begin and MatchData#end, but returns offsets in bytes instead of codepoints.

Pull request: https://github.com/ruby/ruby/pull/10973

One of the use cases is scanning strings: https://github.com/ruby/net-imap/pull/286/files
MatchData#byteend is faster than MatchData#byteoffset because there is no need to allocate an Array.
Here's a benchmark result:

voyager:ruby$ cat b.rb 
require "benchmark"
require "strscan"

text = "あ" * 100000

Benchmark.bmbm do |b|
  b.report("byteoffset(0)[1]") do
    pos = 0
    while text.byteindex(/\G./, pos)
      pos = $~.byteoffset(0)[1]
    end
  end

  b.report("byteend(0)") do
    pos = 0
    while text.byteindex(/\G./, pos)
      pos = $~.byteend(0)
    end
  end
end
voyager:ruby$ ./tool/runruby.rb b.rb           
Rehearsal ----------------------------------------------------
byteoffset(0)[1]   0.020558   0.000393   0.020951 (  0.020963)
byteend(0)         0.018149   0.000000   0.018149 (  0.018151)
------------------------------------------- total: 0.039100sec

                       user     system      total        real
byteoffset(0)[1]   0.020821   0.000000   0.020821 (  0.020822)
byteend(0)         0.017455   0.000000   0.017455 (  0.017455)

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#1 [ruby-core:118301]

Does this difference matter in realistic usages (e.g. that net-imap one)? How much improvement is it there?

Regarding naming, byteend seems hard to read, I think byte_begin/byte_end is much clearer.

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#2 [ruby-core:118309]

Eregon (Benoit Daloze) wrote in #note-1:

Does this difference matter in realistic usages (e.g. that net-imap one)? How much improvement is it there?

I guess the diffrence doesn't matter so much compared to I/O etc, but it's frustrating to write code like $~.byteoffset(0)[1] when only the end offset is needed.

Regarding naming, byteend seems hard to read, I think byte_begin/byte_end is much clearer.

I proposed byteend for consistency with existing methods such as byteoffset.
If we choose byte_end, it may be better to introduce new aliases for such existing methods.

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#3 [ruby-core:118310]

I understand the use-case. I agree with the addition of the feature, but I don't like the name. The names bytebegin, byteend are follow the byteindex tradition, but it is very hard to read (especially byteend). Any other name suggestions?

Matz.

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#4 [ruby-core:118313]

matz (Yukihiro Matsumoto) wrote in #note-3:

I understand the use-case. I agree with the addition of the feature, but I don't like the name. The names bytebegin, byteend are follow the byteindex tradition, but it is very hard to read (especially byteend). Any other name suggestions?

I came up with names begin_in_bytes and end_in_bytes, but byte_begin / byte_end suggested by Eregon may be better.

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#5 [ruby-core:118601]

OK. I didn't like the names (especially byteend), but after looking at them for a while I got used to it and was ready to compromise.

Matz.

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#6

Status changed from Open to Closed

Applied in changeset git|e048a073a3cba04576b8f6a1673c283e4e20cd90.

Add MatchData#bytebegin and MatchData#byteend

These methods return the byte-based offset of the beginning or end of the specified match.

[Feature #20576]

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #20576

Add MatchData#bytebegin and MatchData#byteend

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#1 [ruby-core:118301]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#2 [ruby-core:118309]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#3 [ruby-core:118310]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#4 [ruby-core:118313]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#5 [ruby-core:118601]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#6

Project

General

Profile

Ruby

Custom queries

Feature #20576

Add MatchData#bytebegin and MatchData#byteend

Updated by Eregon (Benoit Daloze) almost 2 years ago ActionsCopy link #1 [ruby-core:118301]

Updated by shugo (Shugo Maeda) almost 2 years ago ActionsCopy link #2 [ruby-core:118309]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago ActionsCopy link #3 [ruby-core:118310]

Updated by shugo (Shugo Maeda) almost 2 years ago ActionsCopy link #4 [ruby-core:118313]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago ActionsCopy link #5 [ruby-core:118601]

Updated by shugo (Shugo Maeda) almost 2 years ago ActionsCopy link #6

Updated by Eregon (Benoit Daloze) almost 2 years ago Actions
Copy link
#1 [ruby-core:118301]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#2 [ruby-core:118309]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#3 [ruby-core:118310]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#4 [ruby-core:118313]

Updated by matz (Yukihiro Matsumoto) almost 2 years ago Actions
Copy link
#5 [ruby-core:118601]

Updated by shugo (Shugo Maeda) almost 2 years ago Actions
Copy link
#6