Project

General

Profile

Actions

Bug #19392

closed

Endless method and parsing priorities

Added by zverok (Victor Shepelev) almost 2 years ago. Updated about 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:112119]

Description

Initial description

Discovered by Lucian Ghinda:

def test = puts("foo") and puts("bar")
# prints "bar" immediately
test
# prints "foo"

It seems that it is a parser error, right?..

RubyVM::AbstractSyntaxTree.parse('def test = puts("foo") and puts("bar")')
#  => 
# (SCOPE@1:0-1:38                                                         
#  tbl: []                                                                
#  args: nil                                                              
#  body:                                                                  
#    (AND@1:0-1:38                                                        
#       (DEFN@1:0-1:22                                                    
#        mid: :test                                                       
#        body:                                                            
#          (SCOPE@1:0-1:22                                                
#           tbl: []                                                       
#           args:                                                         
#             (ARGS@1:0-1:8 pre_num: 0 pre_init: nil opt: nil first_post: nil post_num: 0 post_init: nil rest: nil kw: nil kwrest: nil block: nil)
#           body: (FCALL@1:11-1:22 :puts (LIST@1:16-1:21 (STR@1:16-1:21 "foo") nil))))
#       (FCALL@1:27-1:38 :puts (LIST@1:32-1:37 (STR@1:32-1:37 "bar") nil)))) 

E.g. it is parsed as

(def test = puts("foo")) and (puts("bar"))

...which is hardly intentional or have any practical use. The rightly parsed code in this case can have practical use, like

def write(data) = File.write(@filename, data) == data.size or raise "Something went wrong"

Additional cases of what seems to be the same problem

def save = File.write(name, self.to_yaml) unless invalid?
# Parsed as:
(def save = File.write(name, self.to_yaml)) unless invalid?

...which makes it very hard for the users to diagnose the real reason, see #19731

def initialize(a, b) = @a, b = a, b
# syntax error, unexpected ',', expecting end-of-input (SyntaxError)                                                 
# def initialize(a, b) = @a, b = a, b                            
#                          ^         

# Again, parsed as
(def initialize(a, b) = @a), b = a, b

While this one is at least diagnosed early, in pathological cases, it might lead to very subtle bugs:

private def start = @operation, @conversion = :print, :to_s

This code doesn't throw a syntax error, but its effect is very far from expected. Again, it is parsed as

private( (def start = @operation), @conversion = :print, :to_s )

...and ends up in:

  • defining a private method start
  • making private methods :print and :to_s

Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #19731: Can’t call an instance method inside an Endless method definitionClosedActions

Updated by austin (Austin Ziegler) almost 2 years ago

Note that this does not happen with && / ||, so it’s a binding precedence issue.

Updated by ufuk (Ufuk Kayserilioglu) almost 2 years ago

Given that the precendence of and and or is already low, I think this is expected behaviour and fully replicates the operation of local variable assignment as well. That is:

foo = some_method and puts("bar")

would also be parsed as:

(foo = some_method) and puts("bar")

I would expect that do be the same when "assigning" a method.

The "fix" for the use-case you are describing is to be explicit with grouping:

def write(data) = (File.write(@filename, data) == data.size or raise "Something went wrong")

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

  • Status changed from Open to Rejected

This precedence is needed to allow:

public def test = puts("foo")

Updated by zverok (Victor Shepelev) almost 2 years ago

  • Status changed from Rejected to Open

@nobu (Nobuyoshi Nakada) Sorry, can we talk about it a bit more?..

I am not an expert in parser's internals, so I can believe that it is 100% impossible to solve, but is it?..

From developer's experience point of view:

  • There is a high possibility of using def foo = statement or statement, it is quite regularly used idiom for shortcutting and validations;
  • Depending on the second part of such expression, the wrong behavior might be noticed very late;
  • There are no useful consequences (that I can think of) for breaking the one-line method body on and/or, so it can't be justified by "it doesn't work this way, because it actually does this interesting thing!"

Therefore it definitely perceives like a bug and a harmful one, not just "useless trivia" kind.

Moreover:

  • One-line method definitions aren't any expressions. They introduce a new scope, for example, and having a scope suddenly ending mid-line on operator (and not at least on ;) is behavior that screams "bug", even if it can be explained in "it is internal precedence" terms:
def test = x = 5 and puts x 
# in `<main>': undefined local variable or method `x' for main:Object (NameError)
  • Again, I am not a Ruby internal expert (though if there is no other choice, I can take on studying this problem), but I have a hunch that it isn't impossible to introduce a simple rule "endless method's definition ends on ; or \n, and nothing else". That's the user expectation, anyway.

Updated by nobu (Nobuyoshi Nakada) almost 2 years ago

zverok (Victor Shepelev) wrote in #note-4:

Moreover:

  • One-line method definitions aren't any expressions.

It depends on how you define "expressions".
At least in Ruby's terms, any method definition can be an argument of other method calls.

public def test1(x) = x = 5
public def test2(x)
  x = 5
end

In the above examples, both methods return the method ID and the result will be passed to the method public.
To utilize this, def returns the ID since 2.3.

Updated by zverok (Victor Shepelev) almost 2 years ago

@nobu (Nobuyoshi Nakada)

It depends on how you define "expressions".
At least in Ruby's terms, any method definition can be an argument of other method calls.

Thank you, I am well aware of this. (I maintain Ruby Changelog, including detailed explanations on how and why methods as expressions can be used.)

To utilize this, def returns the ID since 2.3.

Since 2.1, actually :)

That's why I didn't say, "method definitions aren't expressions", I said, "method definitions aren't any expressions", meaning that while being expressions indeed, they still behave differently in several important aspects from most the other expressions.

One of those aspects is the creation of its own scope.
Endless methods do it without explicit ending sign (like, } or end). This makes "where the expression ends" an extremely sensitive question, which, I believe, shouldn't be brushed off easily.

Updated by zverok (Victor Shepelev) almost 2 years ago

and is actually not alone.

The code like this is also parsed not the way one would intuitively expect:

def save = File.write(name, self.to_yaml) unless invalid?

Actual parsing is this:

(def save = File.write(name, self.to_yaml)) unless invalid?

I believe that combining postfix if/unless with one-line methods is extremely possible, while intention to conditionally define methods with postfix condition is much less so.

In general, I believe that

def method
  expression
end

and

def method = expression

should be equivalent as long as it is humanly possible, because the simplification of one-expression method to endless one is what many codebases would try to do, especially for simple utility classes.

And if they can't be made equivalent (due to some parser complexities), the only other acceptable behavior is an error or at least warning (though the latter is much less desirable), not a silent change of semantics.

Updated by mame (Yusuke Endoh) almost 2 years ago

  • Status changed from Open to Feedback

@nobu (Nobuyoshi Nakada) says it would be difficult to implement. If someone creates a patch successfully, we may be able to consider it.

Updated by matz (Yukihiro Matsumoto) over 1 year ago

  • Status changed from Feedback to Closed

Analogous to a = b and c parsed as (a = b) and c, def a = b and c should be parsed as (def a = b) and c.
Considering the fact no one really want to write (def a = b) and c, it may be better to cause a warning in the future.
I understand there's some confusion regarding the precedence of and, or operators, but if we change the precedence it may break a huge amount of existing code.

Matz.

Updated by zverok (Victor Shepelev) over 1 year ago

@matz (Yukihiro Matsumoto) With all due respect, I don't believe the current behaviour is acceptable. Considering the behaviour affects not only control flow and/or (relatively rarely used), but also simple if (https://bugs.ruby-lang.org/issues/19392#note-7), I believe the behaviour should be considered a bug.

I don't believe the fix lies in changing priorities of some operators, rather in adjustment of endless methods parsing so everything till the end of the line (or expression, if some way of linebreaking used) would be considered the body of such method, always.

I don't believe it to be impossible, but due to obvious reasons can't experiment with implementation myself currently.

I should say though that this behaviour would always be perceived as a bug, and I believe can have an effect on Ruby's reputation, when one of the "cool new features" behaves in an obviously wrong way.

Actions #11

Updated by mame (Yusuke Endoh) over 1 year ago

  • Related to Bug #19731: Can’t call an instance method inside an Endless method definition added

Updated by yui-knk (Kaneko Yuichiro) over 1 year ago

JFYI: https://github.com/ruby/ruby/pull/8054 doesn't change the precedence globally, however def test = puts("foo") or puts("bar") is interrupted as def test = (puts("foo") or puts("bar")).

Updated by Eregon (Benoit Daloze) over 1 year ago

I agree with @zverok (Victor Shepelev), the current behavior is highly confusing and undesirable. No Ruby user wants the current behavior.

I think if the body of an endless method doesn't cover the full line, we should get a warning (enabled by default) or an error, or better: accept the full line as the endless method's body.

Actions #14

Updated by zverok (Victor Shepelev) about 1 year ago

  • Status changed from Closed to Open

Updated by zverok (Victor Shepelev) about 1 year ago

  • Subject changed from Endless method vs and/or to Endless method and parsing priorities
  • Description updated (diff)

Updated the description to underline the nature of the problem better.

Updated by mame (Yusuke Endoh) about 1 year ago

  • Status changed from Open to Closed

Discussed at today's dev meeting. Because a = b and c is interpreted as (a = b) and c, it is natural that def a = b and c is interpreted as (def a = b) and c, as Matz said in his comment #note-9. This decision didn't change even with the implementation.

Just FYI: @yui-knk (Kaneko Yuichiro) said he just wanted to show that the implementation was therotically possible with parse.y, but that he did never like it.

If you really need it, it would be good to write a Rubocop cop or something to warn it.

Updated by zverok (Victor Shepelev) about 1 year ago

@mame (Yusuke Endoh) I understand I might sound annoying, but I believe there is absolutely nothing natural about this behavior for the users (as is shown by the confusion of everybody who is met with this behavior in real codebases: it is frequently impossible even to guess where the problem is coming from).

While we reused = sign to designate one-line methods, it is not "like any other assignment":

  1. It doesn't assign a value, actually, that can be then operated by value rules;
  2. It can't be used in method bodies (or, it can, but the result wouldn't follow the logic of other assignments, but follow the logic of method definitions, affecting the outer scope);
  3. It doesn't evaluate the expression on the right but "stores" it as a method body.

The expectation of every sane usage of endless methods is that "I have a short and expressive method body, I want to put it in one line, it should just work," i.e., that whatever it would be:

def foo(args)
  expression
end

and

def foo(args) = expression

would be equivalent in as many situations as possible.

While some "this can't be parsed at all" situations might be accepted, "it is parsed successfully, but the scope changes in the middle of what looks like a single expression" would never ever be "natural."

The most annoying and confusing thing is "trailing if" because it is an incredibly widespread statement for short-and-expressive code.

I understand the complexity of the parser development (and I can even assume that fixing the problem is downright impossible, though I hope it is not so).

Still, I believe that with current behavior being preserved, the feature would be forever considered "buggy and undercooked" and will never see wide adoption, indirectly affecting the reputation of the language.

Can we please have one more round of discussion, even if after 3.3 release?

I am ready to try to contribute as much as possible, from preparing a presentation to give at the developer meeting with more detailed argumentation to trying to look at the parser myself. Unfortunately, I am not proficient in this part of Ruby code, but I can try; though, to invest a significant amount of time into this attempt, I would like to at least have an agreement on the fact that if the fix is possible, it would be considered.

Updated by Dan0042 (Daniel DeLorme) about 1 year ago

Because a = b and c is interpreted as (a = b) and c, it is natural that def a = b and c is interpreted as (def a = b) and c

I guess I'll just add my voice to the others who think the current behavior is totally unnatural and unintuitive. I do not agree that (a = b) and c automatically implies (def a = b) and c. It feels more like def should introduce a different parsing rule, similar to how a if b rescue c is NOT parsed as a if (b rescue c), because there's more than just precedence rules in play.

Updated by Eregon (Benoit Daloze) about 1 year ago

def = is clearly more about def than =.
It is not an assignment, as clearly said by other just before this message.

Because a = b and c is interpreted as (a = b) and c, it is natural that def a = b and c is interpreted as (def a = b) and c, as Matz said in his comment #note-9.

By that reasoning, def m; a = b and c; end should be (def m; a = b) and (c; end) then? (if the most important thing is that and is lower precedence than everything else no matter the context)
def is stronger than = and and, that is very intuitive and simple. The same for module, class, they are scope-changing keywords.
It makes no sense that = and and would be stronger than a scope-changing keyword, from a usage point of view.

I think either this should be fixed or endless methods should be removed because hopelessly broken and uninintuitve.
Of course we cannot remove them for compatibility so let's fix this to be like the overwhelming majority of Rubyists expect?

Updated by Eregon (Benoit Daloze) about 1 year ago

indirectly affecting the reputation of the language.

I can share the feeling.
The current parsing of endless methods feels like the old PHP parser, i.e., very buggy and counterintuitive (e.g. wouldn't accept an arbitrary exception inside [] for indexing IIRC).

I think it is a reasonable expectation that if one has a single-line method (without ;) they can transform it into an endless method.
That's not true due to this issue and there are many confusing cases as shown in the description.

Maybe we should reconsider how we parse/handle endless methods, i.e., maybe "desugar" them at lexer time like def m(args) = ...\n into def m(args); ...; end\n?
It does feel hacky, but I think at least that expresses the desired and intuitive semantics clearly.

Updated by duerst (Martin Dürst) about 1 year ago

Because a = b and c is interpreted as (a = b) and c, it is natural that def a = b and c is interpreted as (def a = b) and c

I agree with others that there's not much (if anything) natural about this. In def a = b and c, the most important thing that virtually every Ruby programmer will see is def.

To formalize this, we can think about a special def= operator. This def= operator has lower precedence than and or if or unless,... As a result, def a = b and c is interpreted as def a = (b and c), which is easier to understand and more useful (because it is the more frequent usecase).

Parsing this so that def= has lower precedence may not exactly be easy. But the Ruby parser goes through many contortions to make Ruby a language useful for human programmers, with very little concern for 'parsing difficulty'.[1] It would be weird if we gave up in this case because of 'parser difficulty'. def= has the huge advantage that def is prefix, which is much easier to handle in parsers that an infix or postfix operator.

[1] The opposite of this is Pascal and other languages created by Niklaus Wirth. Wirths main goal when designing a language was to make the compiler easy to implement.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like1Like0Like0Like0