Project

General

Profile

Actions

Feature #11999

closed

MatchData#to_h to get a Hash from named captures

Added by sorah (Sorah Fukumori) almost 9 years ago. Updated almost 8 years ago.

Status:
Closed
Target version:
-
[ruby-core:72897]

Description

class MatchData
  def to_h
    self.names.map { |n| [n, self[n]] }.to_h
  end
end

p '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).to_h #=> {"a"=>"1", "b"=>"2", "c"=>nil}

Sometimes I want to get a Hash from named capture, but currently I have to use #names + #captures. How about adding MatchData#to_h for convenience way?


Files

11999.diff (2.77 KB) 11999.diff sorah (Sorah Fukumori), 01/18/2016 08:55 AM
11999-2.diff (3.41 KB) 11999-2.diff Based on specification at [ruby-core:73839] sorah (Sorah Fukumori), 02/16/2016 12:35 PM

Updated by sorah (Sorah Fukumori) almost 9 years ago

Consideration is behavior for multiple captures with same group name:

/(?<a>.)(?<a>.)/

MatchData#[] returns the last one and my attached patch follows that behavior.

Updated by danielpclark (Daniel P. Clark) almost 9 years ago

I agree. Please add this feature. I have also looked to do the same thing.

Updated by matz (Yukihiro Matsumoto) almost 9 years ago

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

Matz.

Updated by sorah (Sorah Fukumori) almost 9 years ago

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

Updated by phluid61 (Matthew Kerwin) almost 9 years ago

Shota Fukumori wrote:

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Updated by duerst (Martin Dürst) almost 9 years ago

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

Updated by phluid61 (Matthew Kerwin) almost 9 years ago

Martin Dürst wrote:

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

I thought so myself, but the regular expression engine currently does numbered captures only if there are no named captures.

Note: A regexp can't use named backreferences and numbered backreferences simultaneously.

-- http://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Capturing

I guess this is spec.

Updated by Hanmac (Hans Mackowiak) almost 9 years ago

also interesting if you have a with | combined regexp where both of them does have a named capture:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").captures #=> ["b", nil]
reg.match("abc")[:a] # => "b" 
reg.match("xyz") # => #<MatchData "x" a:nil a:"x"> 
reg.match("xyz").captures #=> [nil, "x"]
reg.match("xyz")[:a] # => "x"

(also notice that in the inspect of MatchData the capture :a is shown twice.)
such things does need to be remembered when creating a new function for MatchData

Updated by sorah (Sorah Fukumori) almost 9 years ago

Looks like it's the same problem as I noted here https://bugs.ruby-lang.org/issues/11999#note-2

Updated by Hanmac (Hans Mackowiak) almost 9 years ago

@Shota: i do need to test your patch, but my case is a little bit different than yours.
because it can be nil, it seems to pick the first non-nil value in my case. (or is it the last non-nil?)

specially with your patch:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").to_h #=> {"a" => "b"} or {"a" => nil}

Updated by sorah (Sorah Fukumori) almost 9 years ago

that makes sense for me.

Updated by naruse (Yui NARUSE) almost 9 years ago

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

Updated by phluid61 (Matthew Kerwin) almost 9 years ago

Yui NARUSE wrote:

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

I did some experimenting of my own to this end, and came up with this: https://github.com/phluid61/mug/blob/master/lib/mug/matchdata/hash.rb

The only real weirdness arises from the fact that positional captures don't happen at all if there's a named capture group in the Regexp; but given the resulting mutual exclusivity, the code itself becomes pretty straight-forward.

Updated by sorah (Sorah Fukumori) almost 9 years ago

  • Assignee set to sorah (Sorah Fukumori)

Discussed at https://bugs.ruby-lang.org/projects/ruby/wiki/DevelopersMeeting20160216Japan with Matz and several committers:

Log: https://docs.google.com/document/d/1rj7ODOCSfcsQeBd6-p-NiVwqxDUg05G66LwDOkKOGTw/pub

  • #to_h is inappropriate name while non-named capture exists:

    1. Return Hash with integer keys? ( {0 => "a", 1 => "b"} )
      • there's no use case for this behavior.
      • rejected
    2. #named_captures (accepted)
      • matz said acceptable
  • Behavior when there are multiple named captures with same name

    • Return last matched value

      /(?<a>b)|(?<a>x)/.match("abc").to_h #=> {"a" => "b"}
      
  • Behavior when named captures didn’t match anything

    • Return nil as value
  • Behavior when no named captures

    • #captures returns [] when a regexp has no capture, so #named_captures returns {} when a regexp has no named capture

@matz (Yukihiro Matsumoto) Could you confirm this ↑ and say accept here please?

Updated by sorah (Sorah Fukumori) almost 9 years ago

Updated patch (11999-2.diff).

Actions #18

Updated by sorah (Sorah Fukumori) almost 9 years ago

  • Status changed from Open to Closed

Applied in changeset r53863.


  • re.c: Add MatchData#named_captures
    [Feature #11999] [ruby-core:72897]

  • test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

  • NEWS: News about MatchData#named_captures.

Updated by headius (Charles Nutter) almost 8 years ago

Shouldn't this produce Symbol keys?

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0