Project

General

Profile

Actions

Feature #18408

closed

Allow pattern match to set instance variables

Added by Dan0042 (Daniel DeLorme) about 1 year ago. Updated 5 months ago.

Status:
Rejected
Priority:
Normal
Target version:
-
[ruby-core:106644]

Description

I expected this to work:

42 => @v

But instead it raises "syntax error, unexpected instance variable"

Is this intentional?


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #16372: Allow pattern matching to bind instance variablesRejectedActions

Updated by palkan (Vladimir Dementyev) about 1 year ago

This worked originally (when this feature was called "rightward assignment"). But since it has been transformed into a pattern match (in the final 3.0 release), we lost the ability to assign to anything but local vars—that's what pattern matching variable binding supports.

I think, it worth to consider adding support for other vars binding in pattern matching in general. Another example:

case {name: "John", age: 42}
in name: /jo/ => @name, age: @age
end

puts [@name, @age] #=> ["John", 42]

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Tracker changed from Bug to Feature
  • Subject changed from Rightward assignment into instance variable to Allow pattern match to set instance variables
  • Backport deleted (2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN)

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

Updated by mame (Yusuke Endoh) about 1 year ago

  • Assignee set to ktsj (Kazuki Tsujimoto)

Updated by baweaver (Brandon Weaver) about 1 year ago

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

While we're in that domain where would we want to draw the line? Globals, constants, class variables?

Constants, for instance, may sound absurd on the surface level but may present a very interesting potential for JS import-like syntax:

require "either" # Made up
Either => Left:, Right:

require "ramda"
Ramda => map:, filter:, compose:

# Real, but contrived example
compose(map(-> { _1 * 2 }), filter(-> { _1.even? })

...but that's potentially a very edge case, though I am interested in what that could be used for.

Anyways, point being we should probably decide where the line is on what can and cannot be assigned into. I would posit the answer should be that if it can be assigned to in normal Ruby it should be assignable in any pattern matching syntax for consistency lest someone open another issue on a type we skipped.

Updated by palkan (Vladimir Dementyev) about 1 year ago

baweaver (Brandon Weaver) wrote in #note-4:

While we're in that domain where would we want to draw the line? Globals, constants, class variables?
...
Anyways, point being we should probably decide where the line is on what can and cannot be assigned into. I would posit the answer should be that if it can be assigned to in normal Ruby it should be assignable in any pattern matching syntax for consistency lest someone open another issue on a type we skipped.

I think, the phrase "variable assignment" draws a line here: any variable (local, instance, class, global) could be assigned / pinned (already implemented in 3.1). (Both parse.y and Parser operates the terms "identifier" (lvar) and "nonlocal_var" (instance, class, global)).

Updated by palkan (Vladimir Dementyev) about 1 year ago

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

I've already implemented this feature for Ruby Next (parser changes) and would like to work on a MRI "backport" ("forwardport" 🤔) if no one minds. /cc @ktsj (Kazuki Tsujimoto)

Updated by palkan (Vladimir Dementyev) about 1 year ago

palkan (Vladimir Dementyev) wrote in #note-6:

jeremyevans0 (Jeremy Evans) wrote in #note-2:

I don't think it's a bug that pattern match only supports setting local variables. However, I agree that it would be a useful feature for pattern match to support setting instance variables, especially now that you can pin instance variables in pattern match in Ruby 3.1.

I've already implemented this feature for Ruby Next (parser changes) and would like to work on a MRI "backport" ("forwardport" 🤔) if no one minds. /cc @ktsj (Kazuki Tsujimoto)

Here is a PR: https://github.com/ruby/ruby/pull/5426

Updated by ktsj (Kazuki Tsujimoto) about 1 year ago

  • Status changed from Open to Assigned

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

That is why it raises syntax error now.

Conversely, it is acceptable only for simple patterns that do not cause the above problems.
However, in this case, the patterns available for normal pattern matching and right assignment will be different so I am negative about this change.

# acceptable
42 => @v 
# not acceptable
case 42
in @v if expr
end
Actions #9

Updated by ktsj (Kazuki Tsujimoto) about 1 year ago

  • Related to Feature #16372: Allow pattern matching to bind instance variables added

Updated by palkan (Vladimir Dementyev) about 1 year ago

ktsj (Kazuki Tsujimoto) wrote in #note-8:

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

In this context, what is the difference between the current approach and the new syntax:

case 1
in a if false
in Integer => b
end

# User can do this and result in UB
@a = a
@b = b

# vs.

case 1
in @a if false
in Integer => @b
end

?

I understand, that these are too synthetic examples, and new syntax could make it a bit more likely to hit UB; however, we already have such edge cases, even documented, and I think the advantages of adding such syntax overcome the shortcomings.

However, in this case, the patterns available for normal pattern matching and right assignment will be different so I am negative about this change.

Totally agree.

Updated by Dan0042 (Daniel DeLorme) about 1 year ago

ktsj (Kazuki Tsujimoto) wrote in #note-8:

If assignment to instance variables (or variables other than local variables) is allowed in pattern matching, there are problems such as:

  1. The value during pattern matching can be observed by other threads
  2. The value of the instance variable will be undefined when pattern matching fails (ref: https://github.com/ruby/ruby/blob/v3_1_0/doc/syntax/pattern_matching.rdoc#label-Appendix+B.+Some+undefined+behavior+examples)

There are a few possible ways to solve these problems right?

  1. The value during pattern matching should be stored in a temporary location on the stack
  2. Once the pattern match is done, copy the temporary value to ivar
    • Copy to ivar even if the pattern match fails; this mimicks the same undefined behavior as lvars
    • Copy to ivar only if the pattern match succeeds; most intuitive and trouble-free behavior
      • In this case I think it's acceptable to have undefined behavior only for lvars
      • But it would be better to have the same predictable behavior as ivars for lvars; IMHO it's better to incur the 1 extra MOV than this undefined behavior. Or maybe I don't understand how this is supposed to "leave room for optimization in the future".

Updated by palkan (Vladimir Dementyev) about 1 year ago

Dan0042 (Daniel DeLorme) wrote in #note-11:

There are a few possible ways to solve these problems right?

  1. The value during pattern matching should be stored in a temporary location on the stack
  2. Once the pattern match is done, copy the temporary value to ivar

So, we need to scan the whole pattern, reserver stack locations for all mentioned variables and resolve them when match succeeds. Sounds like a plan, but...
I can come up with a yet another interesting case:

@a = 2
case {a: 1, b: 1}
in a: @a, b: ^@a
end

I think, the pattern should match; hence we need to track all vars in the pattern and point them to the corresponding temp stack location. Okay, that seems to be feasible.
What if we have the following code:

@a = 2
case {a: 1, b: 2}
in a: @a, b: ^(@a + 1)
end

Oops. The pin expression is not controlled by the pattern matching iseq construction. So, it will refer to the original value.

Another option could be to implement a rollback: again, store original values of all vars before evaluate the whole pattern, and every time a match failed restore the values. That would result in additional N stack allocations (N is the total number of all affected vars in patterns) and (N+2)K additional instructions (K is the number of patterns; put once in the beginning, restore after every pattern unmatched, pop in the end).

However, that would lead to the potential thread-safety issues; and unlike the current proposed version, which is pretty straightforward, the "rollback" approach is less predictable (end users have no idea about it).

  • Copy to ivar even if the pattern match fails; this mimicks the same undefined behavior as lvars

At least, this option is consistent 🙂

Updated by Dan0042 (Daniel DeLorme) about 1 year ago

@palkan (Vladimir Dementyev) Wow, thank you for blowing my mind.

However, is that behavior documented? I can't find it in the docs. I mean, it seems to me that it's relying on the undefined behavior discussed above, so by definition it's not guaranteed to work.

Updated by palkan (Vladimir Dementyev) about 1 year ago

Dan0042 (Daniel DeLorme) wrote in #note-13:

@palkan (Vladimir Dementyev) Wow, thank you for blowing my mind.

However, is that behavior documented?

The examples demonstrate a hypothetical situations if we allow binding ivars.
However, we can just drop @ and get the same behaviour for lvars. And that would lead to the undefined behavior already documented.

Updated by Eregon (Benoit Daloze) about 1 year ago

IMHO there is no such thing as "undefined behavior" in Ruby, the behavior is whatever CRuby currently does, and with enough time people depend on it.

In this case I think it's best to "keep it stupid and simple" and just assign while doing the match, which is the current behavior.
Anything else is far more complicated, incurs additional assignments (so slower for CRuby), and would use more stack space for pattern matching.

Updated by Dan0042 (Daniel DeLorme) about 1 year ago

I don't think an additional assignment and a few bytes on the stack can be considered a performance concern. I hope we worry about having meaningful semantics before such premature micro-optimization. Because I do agree that "the behavior is whatever CRuby currently does, and with enough time people depend on it."

Having the value of the variable change even though the pattern doesn't match is unlike anything in ruby. It's as if "ab" =~ /(a)(c)/ resulted in $1 == "a" even though the regexp doesn't match. It makes sense only if you understand the implementation details of the pattern match. From a conceptual viewpoint, I find this behavior very weird.

Updated by Eregon (Benoit Daloze) about 1 year ago

Dan0042 (Daniel DeLorme) wrote in #note-16:

I don't think an additional assignment and a few bytes on the stack can be considered a performance concern.

It can, that cost would be for every variable in the pattern, not just @ivar because if we do such a change it should be consistent for all variables in pattern matching.
Variables in different clauses but with the same name might not even be able to reuse the same temporary variable (e.g., if they assign/read conditionally).

case 1
in a if a.even?
in Integer => b
end

# is like

a = nil
case 1
when -> v { a = v; a.even? }; expr1
when -> v { b = v and Integer === v }; expr2
end

# or

v = 1
if a = v and a.even?
  expr1
elsif b = v and Integer === v
  expr2
end

In practice it should rarely matter because if a clause isn't matched the code should obviously not look at variables not set by that clause.

Updated by Dan0042 (Daniel DeLorme) about 1 year ago

Eregon (Benoit Daloze) wrote in #note-17:

In practice it should rarely matter because if a clause isn't matched the code should obviously not look at variables not set by that clause.

I'm not sure I agree with that. When I have if condition; x=42; end I expect x to be nil if the branch was not taken. Not some undefined semi-random value. And sometimes I depend on that behavior in my code. I understand pattern matching doesn't work that way, and I understand why and how pattern matching works the way it does, but only because I read this thread. Otherwise I would assume the variable is only set when the pattern matches successfully.

So I had expected this to work and was fairly surprised that it doesn't:

case [1,2,3,4]
in [*,a,1,*]
  puts "before one: #{a}"
in [*,b,2,*]
  puts "before two: #{b}"
in [*,c,3,*]
  puts "before three: #{c}"
else
  puts "no match"
end
#I expected a==nil since the first pattern didn't match, but instead it's a==3
if !a
  puts "nothing before one"
end

I don't know if the current behavior fits Matz' POLS, but it doesn't fit mine. (Not saying it necessarily should, but it's at least a data point for one person who finds this behavior surprising).

Updated by matz (Yukihiro Matsumoto) 5 months ago

  • Status changed from Assigned to Rejected

We have discussed with @ktsj (Kazuki Tsujimoto) about the issue, and concluded it is intentional. Unsuccessful matches remain the assignment result from the internal matches, e.g.,

case [1,2,3]
in [a,2,3,4]
  # unmatch
else
  p a #=> 1 (from partial match)
end

Since instance variables (and global variables) can be accessed from outside, pattern matches can break the object status.

You might think it should be OK to allow instance variables in assignee of the single line pattern match, but the same issue can happen (before raising exceptions). In addition, once we allow instance variables, the request for array reference (a[1]) or attribute acess (obj.attr) would come and things would get more and more complex.

So at the moment, we will decide not to support anything but local variables.

Matz.

Updated by Dan0042 (Daniel DeLorme) 5 months ago

it is intentional. Unsuccessful matches remain the assignment result from the internal matches

I think everyone here understood the current behavior is intentional, but the more important question would be "is it beneficial?"

Since instance variables (and global variables) can be accessed from outside, pattern matches can break the object status.

If we want the object state to stay consistent across threads... well, for one it would require a mutex around every ivar access so IMHO this is not such a relevant concern. But if you don't want ivars to change during match, this can be fixed, for example by using temporary local vars like I suggested in #note-11. I don't think this is a fundamental problem with this syntax, although it increases the cost/benefit ratio.

In addition, once we allow instance variables, the request for array reference (a[1]) or attribute acess (obj.attr) would come and things would get more and more complex.

In other words it's an inevitable slippery slope? I'm sure there would be a request for array reference or attribute access, but that doesn't mean it has to be accepted. There is no reason why one inevitably implies the other, just as 42 => v doesn't imply the acceptance of 42 => @v.

So at the moment, we will decide not to support anything but local variables.

If the proposal is rejected then that's how it is, but I would honestly appreciate if that rejection came from the suitability of the 42 => @v syntax itself. It's disappointing to hear a rejection simply based on nitpicks of the implemention. It's the difference between "42=>@v is a bad syntax" and "42=>@v would be acceptable if someone is able to address issues A,B,C"

Updated by Eregon (Benoit Daloze) 5 months ago

Dan0042 (Daniel DeLorme) wrote in #note-20:

But if you don't want ivars to change during match, this can be fixed, for example by using temporary local vars like I suggested in #note-11. I don't think this is a fundamental problem with this syntax, although it increases the cost/benefit ratio.

No, it cannot be fixed in the presence of guards and arbitrary calls (notably almost anything in pattern matching calls ===), which might access the ivar indirectly.
It was already mentioned in https://bugs.ruby-lang.org/issues/18408#note-8 but it's probably clearer this way:

attr_reader :v

def foo
  case expr
  in [@v] if @v.even? # this could be supported with additional implementation complexity
  in [@v] if v.even? # this cannot be supported
  in [@v] if bar % 3 == 0 # this cannot be supported
  in [@v, self] # this cannot be supported
end

def bar
  @v * 2
end

Also since this would mutate the object it could mean some pattern only matches due to some previous pattern mutating the object, definitely don't want that.

For local variables we have a lot more control since we know exactly what can access the variable value. For ivar that's not true, anything which can refer to self can see the ivar value.

Updated by Dan0042 (Daniel DeLorme) 5 months ago

@Eregon (Benoit Daloze) I understand what you're saying, but my statement was based on the premise "if you don't want ivars to change during match". In the PR prepared by @palkan (Vladimir Dementyev), ivars are modified as the match is evaluated, just like local vars, and the examples you provided would work. But ktsj and Matz said that it's not good for ivars to be modified as the match is evaluated, because that can "break the object status". Logically, it follows that the examples you provided would not work, or rather they would have to work with the value of @v before the matching starts. Because that's the entire point of "if you don't want ivars to change during match" (which btw I do not agree with).

I have to admit I didn't think about guards. This opens a third possibility:
a) ivars are assigned during match; examples above work as you assume; ktsj and Matz are opposed because of threads
b) ivars are assigned after match and guard; examples above use value of @v prior to match
c) ivars are assigned after match but before guard; examples with [@v] behave as you assume, example with [@v,self] behaves otherwise

Updated by Eregon (Benoit Daloze) 5 months ago

Right, and I agree with matz that modifying ivars as the match is evaluated is problematic, I would even say it's unsound for pattern matching to have such side effects.
Testing a pattern should as much as feasible not have side effects.
(I think on local variables it's OK because of the very small scope they can affect but it can also be fixed there if we want)
For example:

case [1, 2]
in [@v, 42] if v.even? then 42
in [a, b] then a + b
end

would have the side effect of assigning @v even though that pattern does not match.
That is very wrong and a sure way to break the object state, consistency and invariants in no time.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0