Project

General

Profile

Feature #11999

MatchData#to_h to get a Hash from named captures

Added by sorah (Sorah Fukumori) over 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:72897]

Description

class MatchData
  def to_h
    self.names.map { |n| [n, self[n]] }.to_h
  end
end

p '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).to_h #=> {"a"=>"1", "b"=>"2", "c"=>nil}

Sometimes I want to get a Hash from named capture, but currently I have to use #names + #captures. How about adding MatchData#to_h for convenience way?

11999.diff (2.77 KB) 11999.diff sorah (Sorah Fukumori), 01/18/2016 08:55 AM
11999-2.diff (3.41 KB) 11999-2.diff Based on specification at [ruby-core:73839] sorah (Sorah Fukumori), 02/16/2016 12:35 PM

Associated revisions

Revision 53863
Added by sorah (Sorah Fukumori) over 1 year ago

* re.c: Add MatchData#named_captures
[Feature #11999]

* test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

* NEWS: News about MatchData#named_captures.

Revision 53863
Added by sorah (Sorah Fukumori) over 1 year ago

* re.c: Add MatchData#named_captures
[Feature #11999]

* test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

* NEWS: News about MatchData#named_captures.

History

#2 [ruby-core:72899] Updated by sorah (Sorah Fukumori) over 1 year ago

Consideration is behavior for multiple captures with same group name:

/(?<a>.)(?<a>.)/

MatchData#[] returns the last one and my attached patch follows that behavior.

#3 [ruby-core:72900] Updated by danielpclark (Daniel P. Clark) over 1 year ago

I agree. Please add this feature. I have also looked to do the same thing.

#4 [ruby-core:72901] Updated by matz (Yukihiro Matsumoto) over 1 year ago

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

Matz.

#5 [ruby-core:72905] Updated by sorah (Sorah Fukumori) over 1 year ago

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

#6 [ruby-core:72910] Updated by phluid61 (Matthew Kerwin) over 1 year ago

Shota Fukumori wrote:

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

#7 [ruby-core:72917] Updated by duerst (Martin Dürst) over 1 year ago

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

#8 [ruby-core:72920] Updated by phluid61 (Matthew Kerwin) over 1 year ago

Martin Dürst wrote:

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

I thought so myself, but the regular expression engine currently does numbered captures only if there are no named captures.

Note: A regexp can't use named backreferences and numbered backreferences simultaneously.

-- http://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Capturing

I guess this is spec.

#9 [ruby-core:72929] Updated by Hanmac (Hans Mackowiak) over 1 year ago

also interesting if you have a with | combined regexp where both of them does have a named capture:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").captures #=> ["b", nil]
reg.match("abc")[:a] # => "b" 
reg.match("xyz") # => #<MatchData "x" a:nil a:"x"> 
reg.match("xyz").captures #=> [nil, "x"]
reg.match("xyz")[:a] # => "x"

(also notice that in the inspect of MatchData the capture :a is shown twice.)
such things does need to be remembered when creating a new function for MatchData

#10 [ruby-core:72933] Updated by sorah (Sorah Fukumori) over 1 year ago

Looks like it's the same problem as I noted here https://bugs.ruby-lang.org/issues/11999#note-2

#11 [ruby-core:72963] Updated by Hanmac (Hans Mackowiak) over 1 year ago

@Shota: i do need to test your patch, but my case is a little bit different than yours.
because it can be nil, it seems to pick the first non-nil value in my case. (or is it the last non-nil?)

specially with your patch:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").to_h #=> {"a" => "b"} or {"a" => nil}

#12 [ruby-core:72977] Updated by sorah (Sorah Fukumori) over 1 year ago

that makes sense for me.

#13 [ruby-core:72979] Updated by naruse (Yui NARUSE) over 1 year ago

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

#14 [ruby-core:72991] Updated by phluid61 (Matthew Kerwin) over 1 year ago

Yui NARUSE wrote:

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

I did some experimenting of my own to this end, and came up with this: https://github.com/phluid61/mug/blob/master/lib/mug/matchdata/hash.rb

The only real weirdness arises from the fact that positional captures don't happen at all if there's a named capture group in the Regexp; but given the resulting mutual exclusivity, the code itself becomes pretty straight-forward.

#15 [ruby-core:73839] Updated by sorah (Sorah Fukumori) over 1 year ago

  • Assignee set to sorah (Sorah Fukumori)

Discussed at https://bugs.ruby-lang.org/projects/ruby/wiki/DevelopersMeeting20160216Japan with Matz and several committers:

Log: https://docs.google.com/document/d/1rj7ODOCSfcsQeBd6-p-NiVwqxDUg05G66LwDOkKOGTw/pub

  • #to_h is inappropriate name while non-named capture exists:
    1. Return Hash with integer keys? ( {0 => "a", 1 => "b"} )
      • there's no use case for this behavior.
      • rejected
    2. #named_captures (accepted)
      • matz said acceptable
  • Behavior when there are multiple named captures with same name

    • Return last matched value
    /(?<a>b)|(?<a>x)/.match("abc").to_h #=> {"a" => "b"}
    
  • Behavior when named captures didn’t match anything

    • Return nil as value
  • Behavior when no named captures

    • #captures returns [] when a regexp has no capture, so #named_captures returns {} when a regexp has no named capture

matz (Yukihiro Matsumoto) Could you confirm this ↑ and say accept here please?

#16 [ruby-core:73844] Updated by sorah (Sorah Fukumori) over 1 year ago

Updated patch (11999-2.diff).

#18 Updated by sorah (Sorah Fukumori) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset r53863.


  • re.c: Add MatchData#named_captures
    [Feature #11999]

  • test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

  • NEWS: News about MatchData#named_captures.

#19 [ruby-core:78263] Updated by headius (Charles Nutter) 8 months ago

Shouldn't this produce Symbol keys?

Also available in: Atom PDF