Project

General

Profile

Actions

Feature #21932

closed

`MatchData#get_int`

Feature #21932: `MatchData#get_int`
1

Added by nobu (Nobuyoshi Nakada) 24 days ago. Updated 3 days ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:124905]

Description

This is suggested by @akr (Akira Tanaka) today, $~.get_int(1) is equivalent to $1.to_i but does not create the intermediate string object.

https://github.com/nobu/ruby/tree/match-get_int


Related issues 1 (1 open0 closed)

Related to Ruby - Feature #21943: Add StringScanner#get_int to extract capture group as Integer without intermediate StringOpenActions

Updated by nobu (Nobuyoshi Nakada) 24 days ago Actions #1

  • Description updated (diff)

Updated by zenspider (Ryan Davis) 23 days ago Actions #2 [ruby-core:124916]

Tried to add a comment to your commit but github is being very sketchy today.

In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?

Updated by nobu (Nobuyoshi Nakada) 23 days ago Actions #3 [ruby-core:124917]

zenspider (Ryan Davis) wrote in #note-2:

In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?

I can't get from where the example comes.

Do you want to mean something like this?

/\d+/.match("1/2/10").get_int(0)    # => 1
/\d+/.match("1/2/10").get_int(0, 1) # invalid radix 1 (ArgumentError)

Updated by Eregon (Benoit Daloze) 21 days ago Actions #5

  • Related to Feature #21943: Add StringScanner#get_int to extract capture group as Integer without intermediate String added

Updated by matz (Yukihiro Matsumoto) 10 days ago Actions #6 [ruby-core:125047]

I agree with adding integer_at(n) to MatchData, and StringScanner too (#21943).

Matz.

Updated by mame (Yusuke Endoh) 9 days ago Actions #7 [ruby-core:125064]

Here is a supplement to Matz's decision.

This method will basically follow the behavior of String#to_i.

The base can be specified as the second argument:

"2024" =~ /(\d+)/
$~.integer_at(1)     # => 2024 (default: base 10)
$~.integer_at(1, 8)  # => 1044 (interprets "2024" as base 8)
$~.integer_at(1, 16) # => 8228 (interprets "2024" as base 16)

When it encounters non-numeric characters or an empty string, it behaves the same as String#to_i:

# integer_at should behave as String#to_i
"foo" =~ /(...)/
$~.integer_at(1) # => 0 (== "foo".to_i)

"0xF" =~ /(...)/
$~.integer_at(1) # => 0 (== "0xF".to_i, not 15)

"" =~ /(\d*)/
$~.integer_at(1) # => 0 (== "".to_i)

"1_0_0" =~ /(\d+(?:_\d+)*)/
$~.integer_at(1) # => 100 (== "1_0_0".to_i)

If the base is set to 0, it respects prefixes like 0x (the same as String#to_i(0)):

"0xF" =~ /(...)/
$~.integer_at(1, 0) # => 15 (== "0xF".to_i(0))

If there is no match for the group, it returns nil:

"b" =~ /(a)|(b)/
$~.integer_at(1) # => nil

Updated by Eregon (Benoit Daloze) 9 days ago Actions #8 [ruby-core:125067]

I think returning 0 when the group isn't parseable as a number seems bad behavior.

At least if I would use this method, I would expect two things of it:

  • It returns the Integer value of that group, without needing Integer($N)
  • It fails if the capture isn't a number, like Kernel#Integer

Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.

Updated by naruse (Yui NARUSE) 9 days ago Actions #9 [ruby-core:125068]

Eregon (Benoit Daloze) wrote in #note-8:

I think returning 0 when the group isn't parseable as a number seems bad behavior.

At least if I would use this method, I would expect two things of it:

  • It returns the Integer value of that group, without needing Integer($N)
  • It fails if the capture isn't a number, like Kernel#Integer

Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.

There is two reason:

  1. there are two major method to parse integer in Ruby: to_i and Integer().
    • to_i is loose and the default base is 10
    • Integer is strict, and the default base is 0; it interprets "0o" and "0x" prefix
      In this use case, interpreting "0x" prefix is not useful. If this behavior is to_i, it is easy to explain the behavior.
      In other words, match_data.get_int(n) behaves as match_data[n]&.to_i
  2. Distinguish with the group is not matched
    Considering /(a)|(\d+)/ =~ "a"; $~.get_int(2).
    The current proposal says it returns nil. Another option for this case is exception, but I think it is not useful.
    At this time I can distinguish the case with matching "0", because this returns 0.

Other minor reasons are...

  • for empty string, it will returns 0.
  • if you want to reject non integers, you can write strict regexp pattern.

Updated by Eregon (Benoit Daloze) 9 days ago ยท Edited Actions #10 [ruby-core:125070]

Thanks for the explanations.

naruse (Yui NARUSE) wrote in #note-9:

In this use case, interpreting "0x" prefix is not useful

It could be useful, but one could workaround that with /0x(\h+)/ instead of /(0x\h+)/.

Leading 0 (octal) is likely more dangerous than 0x though (Integer("011") => 9).

If this behavior is to_i, it is easy to explain the behavior.

It wouldn't be hard to explain it's the same as Integer($N, 10).

Distinguish with the group is not matched

Yes, agreed returning nil for group not matched is good.

for empty string, it will returns 0.

Could easily be handled as a special case but yeah not as simple as Integer($N, 10) then.
Still fairly easy to explain/document.

if you want to reject non integers, you can write strict regexp pattern.

This reason convinces me, it's not bulletproof but should be enough guarantee for most cases to not return 0 except for actual 0's in input (or empty string).

BTW, given the method name is MatchData#integer_at(n), people might expect it uses Integer() as that's very similar to the method name.

Updated by nobu (Nobuyoshi Nakada) 3 days ago Actions #11

  • Status changed from Open to Closed

Applied in changeset git|72eb59d0b23522508300896bbbe73716fe82349e.


[Feature #21932] Add MatchData#get_int

Actions

Also available in: PDF Atom