Feature #18583
openPattern-matching: API for custom unpacking strategies?
Description
I started to think about it when discussing https://github.com/ruby/strscan/pull/30.
The thing is, usage of StringScanner for many complicated parsers invokes some kind of branching.
In pseudocode, the "ideal API" would allow to write something like this:
case <what next matches>
in /regexp1/ => value_that_matched
# use value_that_matched
in /regexp2/ => value_that_matched
# use value_that_matched
# ...
This seems "intuitively" that there should be some way of implementing it, but we fall short. We can do some StringScanner-specific matcher object which defines its own #===
and use it with pinning:
case scanner
in ^(Matcher.new(/regexp1/)) => value_that_matched
# ...
But there is no API to tell how the match result will be unpacked, just the whole StringScanner
will be put into value_that_matched
.
So, I thought that maybe it would be possible to define some kind of API for pattern-like objects, the method with signature like try_match_pattern(value)
, which by default is implemented like return value if self === value
, but can be redefined to return something different, like part of the object, or object transformed somehow.
This will open some interesting (if maybe uncanny) possibilities: not just slicing out the necessary part, but something like
value => ^(type_caster(Integer)) => int_value
So... Just a discussion topic!
Updated by zverok (Victor Shepelev) about 2 years ago
One simpler example is, that matching something with regexps with capture groups is still quite annoying!
case string
when /{{(.+?)}}/
content = Regexp.last_match[1] # looking into global value isn't exactly elegant, right?
We could've probably bend it towards
case string
in /{{(.+?)}}/ => content # the matched group
This, though, raises a question of several match groups, at which point one starts to want more:
case string
in /{{(.+?): (.+?)}}/ => [key, value]
# use key and value
in /{{=(?<named>.+?)}}/ => {named:}
# use named
...so... IDK.
Updated by hmdne (hmdne -) about 2 years ago
# looking into global value isn't exactly elegant, right?
It's not global, it's Fiber-local, so are $1 and friends. This may not be messaged well enough in the documentation though...
[1] pry(main)> z = Fiber.new { /(.)/ =~ 'test' }
=> #<Fiber:0x00007f698a2897e0 (pry):1 (created)>
[2] pry(main)> z.resume
=> 0
[3] pry(main)> Regexp.last_match
=> nil
[4] pry(main)>
Updated by palkan (Vladimir Dementyev) about 2 years ago
This, though, raises a question of several match groups, at which point one starts to want more:
case string in /{{(.+?): (.+?)}}/ => [key, value] # use key and value in /{{=(?<named>.+?)}}/ => {named:} # use named
...so... IDK.
This one could be achieve via guards:
case val
in /(foo|bar)/ if $~ in [val]
puts val
in /(?<named>\d+)/ if $~ in {named: }
puts named
end
That would require adding MatchData#{deconstruct,deconstruct_keys}, though:
refine MatchData do
alias deconstruct captures
def deconstruct_keys(*)
named_captures.transform_keys(&:to_sym)
end
end
Regarding the original proposal (the unpacking API), I think, it could bring more confusion than value. Adding one more implicit layer (in addition to #deconstruct
and #deconstruct_keys
, which could also be overridden) would make pattern matching even more magical in a bad sense.