Project

General

Profile

Actions

Feature #18408

open

Allow pattern match to set instance variables

Added by Dan0042 (Daniel DeLorme) 5 months ago. Updated 4 months ago.

Status:
Assigned
Priority:
Normal
Target version:
-
[ruby-core:106644]

Description

I expected this to work:

42 => @v

But instead it raises "syntax error, unexpected instance variable"

Is this intentional?


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #16372: Allow pattern matching to bind instance variablesRejectedActions

Updated by palkan (Vladimir Dementyev) 5 months ago

This worked originally (when this feature was called "rightward assignment"). But since it has been transformed into a pattern match (in the final 3.0 release), we lost the ability to assign to anything but local vars—that's what pattern matching variable binding supports.

I think, it worth to consider adding support for other vars binding in pattern matching in general. Another example:

case {name: "John", age: 42}
in name: /jo/ => @name, age: @age
end

puts [@name, @age] #=> ["John", 42]

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

  • Backport deleted (2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN)
  • Subject changed from Rightward assignment into instance variable to Allow pattern match to set instance variables
  • Tracker changed from Bug to Feature

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

Updated by mame (Yusuke Endoh) 4 months ago

  • Assignee set to ktsj (Kazuki Tsujimoto)

Updated by baweaver (Brandon Weaver) 4 months ago

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

While we're in that domain where would we want to draw the line? Globals, constants, class variables?

Constants, for instance, may sound absurd on the surface level but may present a very interesting potential for JS import-like syntax:

require "either" # Made up
Either => Left:, Right:

require "ramda"
Ramda => map:, filter:, compose:

# Real, but contrived example
compose(map(-> { _1 * 2 }), filter(-> { _1.even? })

...but that's potentially a very edge case, though I am interested in what that could be used for.

Anyways, point being we should probably decide where the line is on what can and cannot be assigned into. I would posit the answer should be that if it can be assigned to in normal Ruby it should be assignable in any pattern matching syntax for consistency lest someone open another issue on a type we skipped.

Updated by palkan (Vladimir Dementyev) 4 months ago

baweaver (Brandon Weaver) wrote in #note-4:

While we're in that domain where would we want to draw the line? Globals, constants, class variables?
...
Anyways, point being we should probably decide where the line is on what can and cannot be assigned into. I would posit the answer should be that if it can be assigned to in normal Ruby it should be assignable in any pattern matching syntax for consistency lest someone open another issue on a type we skipped.

I think, the phrase "variable assignment" draws a line here: any variable (local, instance, class, global) could be assigned / pinned (already implemented in 3.1). (Both parse.y and Parser operates the terms "identifier" (lvar) and "nonlocal_var" (instance, class, global)).

Updated by palkan (Vladimir Dementyev) 4 months ago

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

I've already implemented this feature for Ruby Next (parser changes) and would like to work on a MRI "backport" ("forwardport" 🤔) if no one minds. /cc @ktsj (Kazuki Tsujimoto)

Updated by palkan (Vladimir Dementyev) 4 months ago

palkan (Vladimir Dementyev) wrote in #note-6:

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

I've already implemented this feature for Ruby Next (parser changes) and would like to work on a MRI "backport" ("forwardport" 🤔) if no one minds. /cc @ktsj (Kazuki Tsujimoto)

Here is a PR: https://github.com/ruby/ruby/pull/5426

Updated by ktsj (Kazuki Tsujimoto) 4 months ago

  • Status changed from Open to Assigned

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

That is why it raises syntax error now.

Conversely, it is acceptable only for simple patterns that do not cause the above problems.
However, in this case, the patterns available for normal pattern matching and right assignment will be different so I am negative about this change.

# acceptable
42 => @v 
# not acceptable
case 42
in @v if expr
end
Actions #9

Updated by ktsj (Kazuki Tsujimoto) 4 months ago

  • Related to Feature #16372: Allow pattern matching to bind instance variables added

Updated by palkan (Vladimir Dementyev) 4 months ago

ktsj (Kazuki Tsujimoto) wrote in #note-8:

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

In this context, what is the difference between the current approach and the new syntax:

case 1
in a if false
in Integer => b
end

# User can do this and result in UB
@a = a
@b = b

# vs.

case 1
in @a if false
in Integer => @b
end

?

I understand, that these are too synthetic examples, and new syntax could make it a bit more likely to hit UB; however, we already have such edge cases, even documented, and I think the advantages of adding such syntax overcome the shortcomings.

However, in this case, the patterns available for normal pattern matching and right assignment will be different so I am negative about this change.

Totally agree.

Updated by Dan0042 (Daniel DeLorme) 4 months ago

ktsj (Kazuki Tsujimoto) wrote in #note-8:

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

There are a few possible ways to solve these problems right?

  1. The value during pattern matching should be stored in a temporary location on the stack
  2. Once the pattern match is done, copy the temporary value to ivar
    • Copy to ivar even if the pattern match fails; this mimicks the same undefined behavior as lvars
    • Copy to ivar only if the pattern match succeeds; most intuitive and trouble-free behavior
      • In this case I think it's acceptable to have undefined behavior only for lvars
      • But it would be better to have the same predictable behavior as ivars for lvars; IMHO it's better to incur the 1 extra MOV than this undefined behavior. Or maybe I don't understand how this is supposed to "leave room for optimization in the future".

Updated by palkan (Vladimir Dementyev) 4 months ago

Dan0042 (Daniel DeLorme) wrote in #note-11:

There are a few possible ways to solve these problems right?

  1. The value during pattern matching should be stored in a temporary location on the stack
  2. Once the pattern match is done, copy the temporary value to ivar

So, we need to scan the whole pattern, reserver stack locations for all mentioned variables and resolve them when match succeeds. Sounds like a plan, but...
I can come up with a yet another interesting case:

@a = 2
case {a: 1, b: 1}
in a: @a, b: ^@a
end

I think, the pattern should match; hence we need to track all vars in the pattern and point them to the corresponding temp stack location. Okay, that seems to be feasible.
What if we have the following code:

@a = 2
case {a: 1, b: 2}
in a: @a, b: ^(@a + 1)
end

Oops. The pin expression is not controlled by the pattern matching iseq construction. So, it will refer to the original value.

Another option could be to implement a rollback: again, store original values of all vars before evaluate the whole pattern, and every time a match failed restore the values. That would result in additional N stack allocations (N is the total number of all affected vars in patterns) and (N+2)K additional instructions (K is the number of patterns; put once in the beginning, restore after every pattern unmatched, pop in the end).

However, that would lead to the potential thread-safety issues; and unlike the current proposed version, which is pretty straightforward, the "rollback" approach is less predictable (end users have no idea about it).

  • Copy to ivar even if the pattern match fails; this mimicks the same undefined behavior as lvars

At least, this option is consistent 🙂

Updated by Dan0042 (Daniel DeLorme) 4 months ago

@palkan (Vladimir Dementyev) Wow, thank you for blowing my mind.

However, is that behavior documented? I can't find it in the docs. I mean, it seems to me that it's relying on the undefined behavior discussed above, so by definition it's not guaranteed to work.

Updated by palkan (Vladimir Dementyev) 4 months ago

Dan0042 (Daniel DeLorme) wrote in #note-13:

@palkan (Vladimir Dementyev) Wow, thank you for blowing my mind.

However, is that behavior documented?

The examples demonstrate a hypothetical situations if we allow binding ivars.
However, we can just drop @ and get the same behaviour for lvars. And that would lead to the undefined behavior already documented.

Updated by Eregon (Benoit Daloze) 4 months ago

IMHO there is no such thing as "undefined behavior" in Ruby, the behavior is whatever CRuby currently does, and with enough time people depend on it.

In this case I think it's best to "keep it stupid and simple" and just assign while doing the match, which is the current behavior.
Anything else is far more complicated, incurs additional assignments (so slower for CRuby), and would use more stack space for pattern matching.

Updated by Dan0042 (Daniel DeLorme) 4 months ago

I don't think an additional assignment and a few bytes on the stack can be considered a performance concern. I hope we worry about having meaningful semantics before such premature micro-optimization. Because I do agree that "the behavior is whatever CRuby currently does, and with enough time people depend on it."

Having the value of the variable change even though the pattern doesn't match is unlike anything in ruby. It's as if "ab" =~ /(a)(c)/ resulted in $1 == "a" even though the regexp doesn't match. It makes sense only if you understand the implementation details of the pattern match. From a conceptual viewpoint, I find this behavior very weird.

Updated by Eregon (Benoit Daloze) 4 months ago

Dan0042 (Daniel DeLorme) wrote in #note-16:

I don't think an additional assignment and a few bytes on the stack can be considered a performance concern.

It can, that cost would be for every variable in the pattern, not just @ivar because if we do such a change it should be consistent for all variables in pattern matching.
Variables in different clauses but with the same name might not even be able to reuse the same temporary variable (e.g., if they assign/read conditionally).

case 1
in a if a.even?
in Integer => b
end

# is like

a = nil
case 1
when -> v { a = v; a.even? }; expr1
when -> v { b = v and Integer === v }; expr2
end

# or

v = 1
if a = v and a.even?
  expr1
elsif b = v and Integer === v
  expr2
end

In practice it should rarely matter because if a clause isn't matched the code should obviously not look at variables not set by that clause.

Updated by Dan0042 (Daniel DeLorme) 4 months ago

Eregon (Benoit Daloze) wrote in #note-17:

In practice it should rarely matter because if a clause isn't matched the code should obviously not look at variables not set by that clause.

I'm not sure I agree with that. When I have if condition; x=42; end I expect x to be nil if the branch was not taken. Not some undefined semi-random value. And sometimes I depend on that behavior in my code. I understand pattern matching doesn't work that way, and I understand why and how pattern matching works the way it does, but only because I read this thread. Otherwise I would assume the variable is only set when the pattern matches successfully.

So I had expected this to work and was fairly surprised that it doesn't:

case [1,2,3,4]
in [*,a,1,*]
  puts "before one: #{a}"
in [*,b,2,*]
  puts "before two: #{b}"
in [*,c,3,*]
  puts "before three: #{c}"
else
  puts "no match"
end
#I expected a==nil since the first pattern didn't match, but instead it's a==3
if !a
  puts "nothing before one"
end

I don't know if the current behavior fits Matz' POLS, but it doesn't fit mine. (Not saying it necessarily should, but it's at least a data point for one person who finds this behavior surprising).

Actions

Also available in: Atom PDF