Project

General

Profile

Actions

Feature #18583

open

Pattern-matching: API for custom unpacking strategies?

Added by zverok (Victor Shepelev) almost 3 years ago. Updated 8 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:107564]

Description

I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.

In pseudocode, the "ideal API" would allow to write something like this:

case <what next matches>
in /regexp1/ => value_that_matched
  # use value_that_matched
in /regexp2/ => value_that_matched
  # use value_that_matched
# ...

This seems "intuitively" that there should be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own #=== and use it with pinning:

case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...

But there is no API to tell how the match result will be unpacked, just the whole StringScanner will be put into value_that_matched.

So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like try_match_pattern(value), which by default is implemented like return value if self === value, but can be redefined to return something different, like part of the object, or object transformed somehow.

This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like

value => ^(type_caster(Integer)) => int_value

So... Just a discussion topic!

Updated by zverok (Victor Shepelev) almost 3 years ago

One simpler example is, that matching something with regexps with capture groups is still quite annoying!

case string
when /{{(.+?)}}/
  content = Regexp.last_match[1] # looking into global value isn't exactly elegant, right?

We could've probably bend it towards

case string
in /{{(.+?)}}/ => content # the matched group

This, though, raises a question of several match groups, at which point one starts to want more:

case string
in /{{(.+?): (.+?)}}/ => [key, value]
  # use key and value
in /{{=(?<named>.+?)}}/ => {named:}
  # use named

...so... IDK.

Updated by hmdne (hmdne -) almost 3 years ago

# looking into global value isn't exactly elegant, right?

It's not global, it's Fiber-local, so are $1 and friends. This may not be messaged well enough in the documentation though...

[1] pry(main)> z = Fiber.new { /(.)/ =~ 'test' }
=> #<Fiber:0x00007f698a2897e0 (pry):1 (created)>
[2] pry(main)> z.resume
=> 0
[3] pry(main)> Regexp.last_match
=> nil
[4] pry(main)>

Updated by palkan (Vladimir Dementyev) almost 3 years ago

This, though, raises a question of several match groups, at which point one starts to want more:

case string
in /{{(.+?): (.+?)}}/ => [key, value]
  # use key and value
in /{{=(?<named>.+?)}}/ => {named:}
  # use named

...so... IDK.

This one could be achieve via guards:

case val
  in /(foo|bar)/ if $~ in [val]
    puts val
  in /(?<named>\d+)/ if $~ in {named: }
    puts named
end

That would require adding MatchData#{deconstruct,deconstruct_keys}, though:

refine MatchData do
  alias deconstruct captures

  def deconstruct_keys(*)
    named_captures.transform_keys(&:to_sym)
  end
end

Regarding the original proposal (the unpacking API), I think, it could bring more confusion than value. Adding one more implicit layer (in addition to #deconstruct and #deconstruct_keys, which could also be overridden) would make pattern matching even more magical in a bad sense.

Updated by ntl (Nathan Ladd) 8 months ago ยท Edited

Could the match operator, =~, could be used as a general complement to ===?

Example (following original sketch from @zverok (Victor Shepelev)):

class Matcher
  def initialize(regexp)
    @regexp = regexp
  end

  def ===(obj)
    @regexp.match?(obj)
  end

  def =~(obj)
    match_data = @regexp.match(obj)
    match_data
  end
end

case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
  some_named_capture = match_data[:some_named_capture]
  puts "Match: #{some_named_capture}"
end

The implementation of =~ would be optional in my view; not implementing it on whatever implements === would just cause Ruby to behave as it does now:

class Matcher
  def initialize(regexp)
    @regexp = regexp
  end

  def ===(obj)
    @regexp.match?(obj)
  end
end

case "some string"
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_variable
  # match_variable is just "some string"
  puts match_variable.inspect
end

This would add =~ to the pattern matching protocol that's currently comprised of ===, deconstruct and deconstruct_keys. It would make === significantly more useful, and regular expressions provide a compelling example for why: when matching a string to a regular expression pattern, the string is already in lexical scope, but the match data provides new useful information that only comes into existence upon a successful match:

subject = "some string"

case subject
in ^(Matcher.new(/(?<some_named_capture>some) string/) => match_data
  # Capturing the match data variable instead of the original string doesn't make the original string inaccessible: 
  puts "Match subject: #{subject.inspect}"
  # match_data provides additional useful information:
  some_named_capture = match_data[:some_named_capture]
  puts "Match data: :#{some_named_capture}"
end

I also suspect this could be embedded into the pattern matching syntax itself, would could allow for some highly useful possibilities. One example that leaps to mind is reifying primitive data parsed from JSON into a data structure:

SomeStruct = Struct.new(:some_attr, :some_other_attr) do
  def self.===(data)
    data.is_a?(Hash) && data.key?(:some_attr) && data.key?(:some_other_attr)
  end

  def self.=~(data)
    new(**data)
  end
end

some_json = <<JSON
{
  "some_attr": "some value",
  "some_other_attr": "some other value"
}
JSON

# Parse JSON into raw (primitive) data
some_data = JSON.parse(some_json, symbolize_names: true)

case some_data
in SomeStruct => some_struct
  # some_sturct is a reified data structure (SomeStruct) built from some_data
  puts some_struct.inspect
end
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0