Project

General

Profile

Feature #11999

MatchData#to_h to get a Hash from named captures

Added by sorah Shota Fukumori about 1 year ago. Updated 2 months ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:72897]

Description

class MatchData
  def to_h
    self.names.map { |n| [n, self[n]] }.to_h
  end
end

p '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).to_h #=> {"a"=>"1", "b"=>"2", "c"=>nil}

Sometimes I want to get a Hash from named capture, but currently I have to use #names + #captures. How about adding MatchData#to_h for convenience way?

11999.diff View (2.77 KB) sorah Shota Fukumori, 01/18/2016 08:55 AM

11999-2.diff View - Based on specification at [ruby-core:73839] (3.41 KB) sorah Shota Fukumori, 02/16/2016 12:35 PM

Associated revisions

Revision 53863
Added by sorah Shota Fukumori 11 months ago

  • re.c: Add MatchData#named_captures
    [Feature #11999]

  • test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

  • NEWS: News about MatchData#named_captures.

History

#2 [ruby-core:72899] Updated by sorah Shota Fukumori about 1 year ago

Consideration is behavior for multiple captures with same group name:

/(?<a>.)(?<a>.)/

MatchData#[] returns the last one and my attached patch follows that behavior.

#3 [ruby-core:72900] Updated by Daniel P. Clark about 1 year ago

I agree. Please add this feature. I have also looked to do the same thing.

#4 [ruby-core:72901] Updated by Yukihiro Matsumoto about 1 year ago

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

Matz.

#5 [ruby-core:72905] Updated by sorah Shota Fukumori about 1 year ago

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

#6 [ruby-core:72910] Updated by Matthew Kerwin about 1 year ago

Shota Fukumori wrote:

is not always able to convert to Hash/Map.

Ah -- agreed. How about MatchData#named_captures?

I can't think this name is the best, suggestions welcome.

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

#7 [ruby-core:72917] Updated by Martin Dürst about 1 year ago

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

#8 [ruby-core:72920] Updated by Matthew Kerwin about 1 year ago

Martin Dürst wrote:

Matthew Kerwin wrote:

I think #named_captures is the best name, since that is precisely what it returns (i.e. it never includes numbered captures.)

Could it make sense to include numbered captures in the hash, too? Just thinking aloud.

I thought so myself, but the regular expression engine currently does numbered captures only if there are no named captures.

Note: A regexp can't use named backreferences and numbered backreferences simultaneously.

-- http://ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Capturing

I guess this is spec.

#9 [ruby-core:72929] Updated by Hans Mackowiak about 1 year ago

also interesting if you have a with | combined regexp where both of them does have a named capture:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").captures #=> ["b", nil]
reg.match("abc")[:a] # => "b" 
reg.match("xyz") # => #<MatchData "x" a:nil a:"x"> 
reg.match("xyz").captures #=> [nil, "x"]
reg.match("xyz")[:a] # => "x"

(also notice that in the inspect of MatchData the capture :a is shown twice.)
such things does need to be remembered when creating a new function for MatchData

#10 [ruby-core:72933] Updated by sorah Shota Fukumori about 1 year ago

Looks like it's the same problem as I noted here https://bugs.ruby-lang.org/issues/11999#note-2

#11 [ruby-core:72963] Updated by Hans Mackowiak about 1 year ago

@Shota: i do need to test your patch, but my case is a little bit different than yours.
because it can be nil, it seems to pick the first non-nil value in my case. (or is it the last non-nil?)

specially with your patch:

reg = /(?<a>b)|(?<a>x)/ # => /(?<a>b)|(?<a>x)/ 
reg.match("abc") # => #<MatchData "b" a:"b" a:nil>
reg.match("abc").to_h #=> {"a" => "b"} or {"a" => nil}

#12 [ruby-core:72977] Updated by sorah Shota Fukumori about 1 year ago

that makes sense for me.

#13 [ruby-core:72979] Updated by Yui NARUSE about 1 year ago

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

#14 [ruby-core:72991] Updated by Matthew Kerwin about 1 year ago

Yui NARUSE wrote:

Yukihiro Matsumoto wrote:

I don't think to_h is appropriate, because MatchData is not always able to convert to Hash/Map.
Is there any name candidate?

I feel it can always convert to Hash because even if it doesn't use named captures, the numbering is 1-origin.

irb(main):001:0> /(a)(b)(c)/.match("abc")
=> #<MatchData "abc" 1:"a" 2:"b" 3:"c">
irb(main):002:0> /(a)(b)(c)/.match("abc").to_h
=> {1=>"a", 2=>"b", 3=>"c"}

I did some experimenting of my own to this end, and came up with this: https://github.com/phluid61/mug/blob/master/lib/mug/matchdata/hash.rb

The only real weirdness arises from the fact that positional captures don't happen at all if there's a named capture group in the Regexp; but given the resulting mutual exclusivity, the code itself becomes pretty straight-forward.

#15 [ruby-core:73839] Updated by sorah Shota Fukumori 11 months ago

  • Assignee set to sorah Shota Fukumori

Discussed at https://bugs.ruby-lang.org/projects/ruby/wiki/DevelopersMeeting20160216Japan with Matz and several committers:

Log: https://docs.google.com/document/d/1rj7ODOCSfcsQeBd6-p-NiVwqxDUg05G66LwDOkKOGTw/pub

  • #to_h is inappropriate name while non-named capture exists:
    1. Return Hash with integer keys? ( {0 => "a", 1 => "b"} )
      • there's no use case for this behavior.
      • rejected
    2. #named_captures (accepted)
      • matz said acceptable
  • Behavior when there are multiple named captures with same name

    • Return last matched value
    /(?<a>b)|(?<a>x)/.match("abc").to_h #=> {"a" => "b"}
    
  • Behavior when named captures didn’t match anything

    • Return nil as value
  • Behavior when no named captures

    • #captures returns [] when a regexp has no capture, so #named_captures returns {} when a regexp has no named capture

@matz Could you confirm this ↑ and say accept here please?

#16 [ruby-core:73844] Updated by sorah Shota Fukumori 11 months ago

Updated patch (11999-2.diff).

#17 [ruby-core:73860] Updated by Yukihiro Matsumoto 11 months ago

Accepted.

Matz.

#18 Updated by sorah Shota Fukumori 11 months ago

  • Status changed from Open to Closed

Applied in changeset r53863.


  • re.c: Add MatchData#named_captures
    [Feature #11999]

  • test/ruby/test_regexp.rb(test_match_data_named_captures): Test for above.

  • NEWS: News about MatchData#named_captures.

#19 [ruby-core:78263] Updated by Charles Nutter 2 months ago

Shouldn't this produce Symbol keys?

Also available in: Atom PDF