Project

General

Profile

Actions

Feature #18583

open

Pattern-matching: API for custom unpacking strategies?

Added by zverok (Victor Shepelev) 4 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:107564]

Description

I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.

In pseudocode, the "ideal API" would allow to write something like this:

case <what next matches>
in /regexp1/ => value_that_matched
  # use value_that_matched
in /regexp2/ => value_that_matched
  # use value_that_matched
# ...

This seems "intuitively" that there should be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own #=== and use it with pinning:

case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...

But there is no API to tell how the match result will be unpacked, just the whole StringScanner will be put into value_that_matched.

So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like try_match_pattern(value), which by default is implemented like return value if self === value, but can be redefined to return something different, like part of the object, or object transformed somehow.

This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like

value => ^(type_caster(Integer)) => int_value

So... Just a discussion topic!

Updated by zverok (Victor Shepelev) 4 months ago

One simpler example is, that matching something with regexps with capture groups is still quite annoying!

case string
when /{{(.+?)}}/
  content = Regexp.last_match[1] # looking into global value isn't exactly elegant, right?

We could've probably bend it towards

case string
in /{{(.+?)}}/ => content # the matched group

This, though, raises a question of several match groups, at which point one starts to want more:

case string
in /{{(.+?): (.+?)}}/ => [key, value]
  # use key and value
in /{{=(?<named>.+?)}}/ => {named:}
  # use named

...so... IDK.

Updated by hmdne (hmdne -) 4 months ago

# looking into global value isn't exactly elegant, right?

It's not global, it's Fiber-local, so are $1 and friends. This may not be messaged well enough in the documentation though...

[1] pry(main)> z = Fiber.new { /(.)/ =~ 'test' }
=> #<Fiber:0x00007f698a2897e0 (pry):1 (created)>
[2] pry(main)> z.resume
=> 0
[3] pry(main)> Regexp.last_match
=> nil
[4] pry(main)>

Updated by palkan (Vladimir Dementyev) 3 months ago

This, though, raises a question of several match groups, at which point one starts to want more:

case string
in /{{(.+?): (.+?)}}/ => [key, value]
  # use key and value
in /{{=(?<named>.+?)}}/ => {named:}
  # use named

...so... IDK.

This one could be achieve via guards:

case val
  in /(foo|bar)/ if $~ in [val]
    puts val
  in /(?<named>\d+)/ if $~ in {named: }
    puts named
end

That would require adding MatchData#{deconstruct,deconstruct_keys}, though:

refine MatchData do
  alias deconstruct captures

  def deconstruct_keys(*)
    named_captures.transform_keys(&:to_sym)
  end
end

Regarding the original proposal (the unpacking API), I think, it could bring more confusion than value. Adding one more implicit layer (in addition to #deconstruct and #deconstruct_keys, which could also be overridden) would make pattern matching even more magical in a bad sense.

Actions

Also available in: Atom PDF