Feature #21932
closed`MatchData#get_int`
Added by nobu (Nobuyoshi Nakada) 24 days ago. Updated 3 days ago.
Description
This is suggested by @akr (Akira Tanaka) today, $~.get_int(1) is equivalent to $1.to_i but does not create the intermediate string object.
Updated by nobu (Nobuyoshi Nakada) 24 days ago
Actions
#1
- Description updated (diff)
Updated by zenspider (Ryan Davis) 24 days ago
Actions
#2
[ruby-core:124916]
Tried to add a comment to your commit but github is being very sketchy today.
In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?
Updated by nobu (Nobuyoshi Nakada) 23 days ago
Actions
#3
[ruby-core:124917]
zenspider (Ryan Davis) wrote in #note-2:
In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?
I can't get from where the example comes.
Do you want to mean something like this?
/\d+/.match("1/2/10").get_int(0) # => 1
/\d+/.match("1/2/10").get_int(0, 1) # invalid radix 1 (ArgumentError)
Updated by kou (Kouhei Sutou) 21 days ago
Actions
#4
[ruby-core:124937]
FYI: strscan will use integer_at not get_int: https://github.com/ruby/strscan/pull/192#issuecomment-4002582149
Updated by Eregon (Benoit Daloze) 21 days ago
Actions
#5
- Related to Feature #21943: Add StringScanner#get_int to extract capture group as Integer without intermediate String added
Updated by matz (Yukihiro Matsumoto) 10 days ago
Actions
#6
[ruby-core:125047]
I agree with adding integer_at(n) to MatchData, and StringScanner too (#21943).
Matz.
Updated by mame (Yusuke Endoh) 9 days ago
Actions
#7
[ruby-core:125064]
Here is a supplement to Matz's decision.
This method will basically follow the behavior of String#to_i.
The base can be specified as the second argument:
"2024" =~ /(\d+)/
$~.integer_at(1) # => 2024 (default: base 10)
$~.integer_at(1, 8) # => 1044 (interprets "2024" as base 8)
$~.integer_at(1, 16) # => 8228 (interprets "2024" as base 16)
When it encounters non-numeric characters or an empty string, it behaves the same as String#to_i:
# integer_at should behave as String#to_i
"foo" =~ /(...)/
$~.integer_at(1) # => 0 (== "foo".to_i)
"0xF" =~ /(...)/
$~.integer_at(1) # => 0 (== "0xF".to_i, not 15)
"" =~ /(\d*)/
$~.integer_at(1) # => 0 (== "".to_i)
"1_0_0" =~ /(\d+(?:_\d+)*)/
$~.integer_at(1) # => 100 (== "1_0_0".to_i)
If the base is set to 0, it respects prefixes like 0x (the same as String#to_i(0)):
"0xF" =~ /(...)/
$~.integer_at(1, 0) # => 15 (== "0xF".to_i(0))
If there is no match for the group, it returns nil:
"b" =~ /(a)|(b)/
$~.integer_at(1) # => nil
Updated by Eregon (Benoit Daloze) 9 days ago
Actions
#8
[ruby-core:125067]
I think returning 0 when the group isn't parseable as a number seems bad behavior.
At least if I would use this method, I would expect two things of it:
- It returns the Integer value of that group, without needing
Integer($N) - It fails if the capture isn't a number, like Kernel#Integer
Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.
Updated by naruse (Yui NARUSE) 9 days ago
Actions
#9
[ruby-core:125068]
Eregon (Benoit Daloze) wrote in #note-8:
I think returning 0 when the group isn't parseable as a number seems bad behavior.
At least if I would use this method, I would expect two things of it:
- It returns the Integer value of that group, without needing
Integer($N)- It fails if the capture isn't a number, like Kernel#Integer
Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.
There is two reason:
- there are two major method to parse integer in Ruby: to_i and Integer().
- to_i is loose and the default base is 10
- Integer is strict, and the default base is
0; it interprets "0o" and "0x" prefix
In this use case, interpreting "0x" prefix is not useful. If this behavior is to_i, it is easy to explain the behavior.
In other words,match_data.get_int(n)behaves asmatch_data[n]&.to_i
- Distinguish with the group is not matched
Considering/(a)|(\d+)/ =~ "a"; $~.get_int(2).
The current proposal says it returns nil. Another option for this case is exception, but I think it is not useful.
At this time I can distinguish the case with matching "0", because this returns 0.
Other minor reasons are...
- for empty string, it will returns 0.
- if you want to reject non integers, you can write strict regexp pattern.
Updated by Eregon (Benoit Daloze) 9 days ago
ยท Edited
Actions
#10
[ruby-core:125070]
Thanks for the explanations.
naruse (Yui NARUSE) wrote in #note-9:
In this use case, interpreting "0x" prefix is not useful
It could be useful, but one could workaround that with /0x(\h+)/ instead of /(0x\h+)/.
Leading 0 (octal) is likely more dangerous than 0x though (Integer("011") => 9).
If this behavior is to_i, it is easy to explain the behavior.
It wouldn't be hard to explain it's the same as Integer($N, 10).
Distinguish with the group is not matched
Yes, agreed returning nil for group not matched is good.
for empty string, it will returns 0.
Could easily be handled as a special case but yeah not as simple as Integer($N, 10) then.
Still fairly easy to explain/document.
if you want to reject non integers, you can write strict regexp pattern.
This reason convinces me, it's not bulletproof but should be enough guarantee for most cases to not return 0 except for actual 0's in input (or empty string).
BTW, given the method name is MatchData#integer_at(n), people might expect it uses Integer() as that's very similar to the method name.
Updated by nobu (Nobuyoshi Nakada) 3 days ago
Actions
#11
- Status changed from Open to Closed
Applied in changeset git|72eb59d0b23522508300896bbbe73716fe82349e.
[Feature #21932] Add MatchData#get_int