Feature #11999
closedMatchData#to_h to get a Hash from named captures
Added by sorah (Sorah Fukumori) almost 9 years ago. Updated about 8 years ago.
Description
class MatchData
def to_h
self.names.map { |n| [n, self[n]] }.to_h
end
end
p '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).to_h #=> {"a"=>"1", "b"=>"2", "c"=>nil}
Sometimes I want to get a Hash from named capture, but currently I have to use #names + #captures. How about adding MatchData#to_h for convenience way?
Files
11999.diff (2.77 KB) 11999.diff | sorah (Sorah Fukumori), 01/18/2016 08:55 AM | ||
11999-2.diff (3.41 KB) 11999-2.diff | Based on specification at [ruby-core:73839] | sorah (Sorah Fukumori), 02/16/2016 12:35 PM |
Updated by sorah (Sorah Fukumori) almost 9 years ago
- File 11999.diff 11999.diff added
Updated by sorah (Sorah Fukumori) almost 9 years ago
Consideration is behavior for multiple captures with same group name:
/(?<a>.)(?<a>.)/
MatchData#[] returns the last one and my attached patch follows that behavior.
Updated by danielpclark (Daniel P. Clark) almost 9 years ago
I agree. Please add this feature. I have also looked to do the same thing.
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
I don't think to_h
is appropriate, because MatchData
is not always able to convert to Hash/Map.
Is there any name candidate?
Matz.
Updated by sorah (Sorah Fukumori) almost 9 years ago
is not always able to convert to Hash/Map.
Ah -- agreed. How about MatchData#named_captures?
I can't think this name is the best, suggestions welcome.
Updated by phluid61 (Matthew Kerwin) almost 9 years ago
Shota Fukumori wrote:
is not always able to convert to Hash/Map.
Ah -- agreed. How about MatchData#named_captures?
I can't think this name is the best, suggestions welcome.
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Updated by duerst (Martin Dürst) almost 9 years ago
Matthew Kerwin wrote:
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Could it make sense to include numbered captures in the hash, too? Just thinking aloud.
Updated by phluid61 (Matthew Kerwin) almost 9 years ago
Martin Dürst wrote:
Matthew Kerwin wrote:
I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)
Could it make sense to include numbered captures in the hash, too? Just thinking aloud.
I thought so myself, but the regular expression engine currently does numbered captures only if there are no named captures.
Note: A regexp can't use named backreferences and numbered backreferences simultaneously.
-- http://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Capturing
I guess this is spec.
Updated by Hanmac (Hans Mackowiak) almost 9 years ago
also interesting if you have a with | combined regexp where both of them does have a named capture:
reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").captures #=> ["b", nil]
reg.match("abc")[:a] # => "b"
reg.match("xyz") # => #<MatchData "x" a:nil a:"x">
reg.match("xyz").captures #=> [nil, "x"]
reg.match("xyz")[:a] # => "x"
(also notice that in the inspect of MatchData the capture :a is shown twice.)
such things does need to be remembered when creating a new function for MatchData
Updated by sorah (Sorah Fukumori) almost 9 years ago
Looks like it's the same problem as I noted here https://bugs.ruby-lang.org/issues/11999#note-2
Updated by Hanmac (Hans Mackowiak) almost 9 years ago
@Shota: i do need to test your patch, but my case is a little bit different than yours.
because it can be nil, it seems to pick the first non-nil value in my case. (or is it the last non-nil?)
specially with your patch:
reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").to_h #=> {"a" => "b"} or {"a" => nil}
Updated by sorah (Sorah Fukumori) almost 9 years ago
that makes sense for me.
Updated by naruse (Yui NARUSE) almost 9 years ago
Yukihiro Matsumoto wrote:
I don't think
to_h
is appropriate, becauseMatchData
is not always able to convert to Hash/Map.
Is there any name candidate?
I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.
irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}
Updated by phluid61 (Matthew Kerwin) almost 9 years ago
Yui NARUSE wrote:
Yukihiro Matsumoto wrote:
I don't think
to_h
is appropriate, becauseMatchData
is not always able to convert to Hash/Map.
Is there any name candidate?I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.
irb(main):001:0> /(a)(b)(c)/.match("abc") => #<MatchData "abc" 1:"a" 2:"b" 3:"c"> irb(main):002:0> /(a)(b)(c)/.match("abc").to_h => {1=>"a", 2=>"b", 3=>"c"}
I did some experimenting of my own to this end, and came up with this: https://github.com/phluid61/mug/blob/master/lib/mug/matchdata/hash.rb
The only real weirdness arises from the fact that positional captures don't happen at all if there's a named capture group in the Regexp; but given the resulting mutual exclusivity, the code itself becomes pretty straight-forward.
Updated by sorah (Sorah Fukumori) almost 9 years ago
- Assignee set to sorah (Sorah Fukumori)
Discussed at https://bugs.ruby-lang.org/projects/ruby/wiki/DevelopersMeeting20160216Japan with Matz and several committers:
Log: https://docs.google.com/document/d/1rj7ODOCSfcsQeBd6-p-NiVwqxDUg05G66LwDOkKOGTw/pub
-
#to_h is inappropriate name while non-named capture exists:
- Return Hash with integer keys? (
{0 => "a", 1 => "b"}
)- there's no use case for this behavior.
- rejected
-
#named_captures
(accepted)- matz said acceptable
- Return Hash with integer keys? (
-
Behavior when there are multiple named captures with same name
-
Return last matched value
/(?<a>b)|(?<a>x)/.match("abc").to_h #=> {"a" => "b"}
-
-
Behavior when named captures didn’t match anything
- Return nil as value
-
Behavior when no named captures
-
#captures
returns[]
when a regexp has no capture, so#named_captures
returns{}
when a regexp has no named capture
-
@matz (Yukihiro Matsumoto) Could you confirm this ↑ and say accept here please?
Updated by sorah (Sorah Fukumori) almost 9 years ago
- File 11999-2.diff 11999-2.diff added
Updated patch (11999-2.diff).
Updated by matz (Yukihiro Matsumoto) almost 9 years ago
Accepted.
Matz.
Updated by sorah (Sorah Fukumori) almost 9 years ago
- Status changed from Open to Closed
Applied in changeset r53863.
-
re.c: Add MatchData#named_captures
[Feature #11999] [ruby-core:72897] -
test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.
-
NEWS: News about MatchData#named_captures.
Updated by headius (Charles Nutter) about 8 years ago
Shouldn't this produce Symbol keys?