Project

General

Profile

Actions

Feature #21998

open

Add {Method,UnboundMethod,Proc}#source_range

Feature #21998: Add {Method,UnboundMethod,Proc}#source_range
1

Added by Eregon (Benoit Daloze) 23 days ago. Updated 5 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125265]

Description

I'm using matz's suggestion almost as-is from https://bugs.ruby-lang.org/issues/6012#note-53.
The only change is the proposed class name.

Use Cases

Use cases have been discussed extensively and matz said:

The use cases are real and I want to support them

So I think we don't need to discuss that anymore :)

Background

Adding column and last line information to #source_location is deemed too incompatible given the usages of obj.source_location.last which expect the start line (they would get the end column instead).

Proposal

So instead we add a new method, {Method,UnboundMethod,Proc}#source_range,
which returns a Ruby::SourceRange and has these methods:

  • start_line: 1-indexed (same as source_location.first)
  • end_line: 1-indexed
  • start_column: in bytes, 0-indexed, I think start_byte_column could be good for extra clarity
  • end_column: in bytes, 0-indexed, I think end_byte_column could be good for extra clarity
  • inspect which shows something like #<Ruby::SourceRange (1,0)-(2,10)>.

For the edge case of a heredoc spanning further than the end of a method/block, we would not include it in the end_line & end_column methods, but instead document it and provide a small code snippet in the docs to compute that if desired with Prism.

Alternative

An alternative could be a Ruby::SourceLocation class and obj.source_location(object: true)/obj.source_location(extended: true) but it feels less nice. Since we are designing something new I think it's best to go for the cleanest design.

Consistency

The above methods match the name of methods on Prism::Node and have the same semantics, which is good for consistency and to avoid confusion.

Scope

The scope is intentionally minimal to keep the discussion focused.

Implementation

I'm happy to implement this, it should be pretty trivial and very similar to the extended source_location.
I would prefer to get an approval for this feature before implementing.


Related issues 5 (2 open3 closed)

Related to Ruby - Feature #6012: Proc#source_location also return the columnRejectednobu (Nobuyoshi Nakada)Actions
Related to Ruby - Feature #8751: Add offsets to method#source_locationClosedActions
Related to Ruby - Feature #17930: Add column information into error backtraceClosedmame (Yusuke Endoh)Actions
Related to Ruby - Feature #21005: Update the source location method to include line start/stop and column start/stop detailsOpenEregon (Benoit Daloze)Actions
Related to Ruby - Feature #21795: Methods for retrieving ASTsOpenActions

Updated by Eregon (Benoit Daloze) 23 days ago Actions #1

  • Related to Feature #6012: Proc#source_location also return the column added

Updated by Eregon (Benoit Daloze) 23 days ago Actions #2

  • Related to Feature #8751: Add offsets to method#source_location added

Updated by Eregon (Benoit Daloze) 23 days ago Actions #3

  • Related to Feature #17930: Add column information into error backtrace added

Updated by Eregon (Benoit Daloze) 23 days ago Actions #4

  • Related to Feature #21005: Update the source location method to include line start/stop and column start/stop details added

Updated by matz (Yukihiro Matsumoto) 20 days ago Actions #5 [ruby-core:125304]

Thank you for the proposal. Introducing a new method rather than extending source_location is the right direction, given the compatibility issues we hit.

I approve the shape of this feature, including:

  • The class name Ruby::SourceRange (matches the method name).
  • The four accessors start_line, end_line, start_column, end_column.
  • Byte-based columns, as long as this is clearly documented.
  • Excluding heredocs from end_line/end_column with documentation of the workaround.
  • Not adding Binding#source_range.

Matz.

Updated by mame (Yusuke Endoh) 20 days ago Actions #6 [ruby-core:125305]

@matz (Yukihiro Matsumoto) What should the following return?

f = proc { <<END }
xxx
END

f.source_range #=> Start position: from the `p` of `proc`? Or from the `{`?
               #=> End position: up to the `}`? Or up to the `D` of `END`?

I don't understand the use cases for this feature, so I'm not sure what it should return.

A related feature that Eregon also mentioned in #6012, show_source in irb, behaves as follows, so perhaps the heredoc should be included?

$ irb -rmethod_source
irb(main):001" def foo = (<<END)
irb(main):002" xxx
irb(main):003> END
=> :foo
irb(main):004> show_source foo
From: (irb):1
def foo = (<<END)
xxx
END

Updated by Eregon (Benoit Daloze) 20 days ago Actions #7 [ruby-core:125310]

For the start position I think either is fine.
I think from the p of proc is more useful because it gives extra context to what the block is given (a block can't exist on its own, it's always part of something bigger syntax-wise).

For the end position, as the description says:

For the edge case of a heredoc spanning further than the end of a method/block, we would not include it in the end_line & end_column methods, but instead document it and provide a small code snippet in the docs to compute that if desired with Prism.

There are use cases for both. We can easily compute the end of heredoc if the API returns }, but it's impossible (or at least very complicated & inefficient) to do the reverse.
So the API returns what makes it possible to have both.

Updated by headius (Charles Nutter) 20 days ago Actions #8 [ruby-core:125320]

Eregon (Benoit Daloze) wrote in #note-7:

For the start position I think either is fine.
I think from the p of proc is more useful because it gives extra context to what the block is given (a block can't exist on its own, it's always part of something bigger syntax-wise).

The proc part is irrelevant to the source of the block, and is only a method call receiver for a block argument. We wouldn't include any other calls that receive a block as part of the source of the block, so proc should be no different.

Updated by mame (Yusuke Endoh) 20 days ago Actions #9 [ruby-core:125321]

Eregon (Benoit Daloze) wrote in #note-7:

We can easily compute the end of heredoc if the API returns }

How do you determine whether a block contains a heredoc? Consider:

  • { <<X }: heredoc
  • { x<<X }: shift operator
  • { x <<X }: heredoc

Of course, you also have to handle <<"X", and "<<X" is a string literal, and you have to consider cases like { <<"X" + <<"Y" }.

Even once you have identified a heredoc, finding its end delimiter is not trivial either:

<<"X"
#{
X
}
X

Here the end delimiter is the X on line 5, because #{ opens an interpolation.

<<'X'
#{
X
}
X

Here the end delimiter is the X on line 3, because single-quoted heredocs do not interpolate, so #{ is literal text.

I cannot see how any of this is "easily computed".

Updated by Eregon (Benoit Daloze) 16 days ago · Edited Actions #10 [ruby-core:125335]

I discussed this in detail with @mame (Yusuke Endoh) and @matz (Yukihiro Matsumoto).

@mame (Yusuke Endoh) (to reply here too) what I meant is using Prism with the Method|UnboundMethod|Proc and the source_range to find the Prism node, and then compute the max end offset by walking the child nodes (recursively).

Matz said for the example in https://bugs.ruby-lang.org/issues/21998#note-6 : End position: up to the }.

I'm not sure if @matz (Yukihiro Matsumoto) made his mind on: Start position: from the p of proc? Or from the {?

The example of define_method(:foo) { ... } was given, for that case at least it seems very useful context to include the call which receives the block.

Also foo { break } the break there breaks out of foo, so IMO a block isn't complete without the method call to which receives it.
For the cases where one wants to access that CallNode it's much faster to return it here, and it's easy to get "just the block node" with call_node.block.
But, we can implement this handling in Prism if it's decided that source_range should have a start position of {.

Updated by kddnewton (Kevin Newton) 11 days ago Actions #11 [ruby-core:125358]

We should make sure start line respects the line passed into eval (not sure if that was implicit, but wanted to be sure to call it out).

For the heredocs, I agree that they should be omitted. We made that decision early with Prism because of the following code just being super confusing:

define_method(:foo) { <<~FOO }; define_method(:bar) { <<~BAR }
  foo content
FOO
  bar content
BAR

The reality is in order to represent this in a way that even comes close we would need source range to not necessarily be contiguous. I think instead we should document the limitation, and explicitly call out that eval-ing or trusting that the source returned by the source range is not and never expected to be 1:1 with the original.

Updated by headius (Charles Nutter) 11 days ago · Edited Actions #12 [ruby-core:125362]

The example of define_method(:foo) { ... } was given

define_method is just a method call like any other. It should not be included in the range for the syntactic construct that is the block.

What about do..end with a huge piece of code attached?

foo(
  # large amount of code
) do
  ...
end

It makes no sense for the range of the syntactic block here to include the entire expression of the call and its arguments. None of that is syntactically part of the block and it would parse if the block were removed.

Also foo { break } the break there breaks out of foo, so IMO a block isn't complete without the method call to which receives it.

The target of the break is a semantic detail, not a syntactic one, and it is not determined at parse time.

For that matter, arguments to a method would not be arguments without the call, but you don't consider their syntactic range to include the entire call. The block is just an argument of another form.

I'll turn this question around: how can I get just the range of the block syntactic element if it includes all this other stuff unrelated to the block?

I think what you really want is for the call's range to include the block, not the other way around. A nested AST element's range should not include source ranges of parent nodes.

Updated by kddnewton (Kevin Newton) 11 days ago Actions #13 [ruby-core:125364]

It makes no sense for the range of the syntactic block here to include the entire expression of the call and its arguments.

If I'm understanding you correctly, then what you would like for source_range for the example for foo and bar methods is then:

define_method(:foo) { <<~FOO }; define_method(:bar) { <<~BAR }
  foo content
FOO
  bar content
BAR

method(:foo).source_range # => 
# { <<~FOO }; define_method(:bar) { <<~BAR }
#   foo content
# FOO

method(:bar).source_range # =>
# { <<~BAR }
#   foo content
# FOO
#   bar content
# BAR

as opposed to not including the heredoc, where it would instead be:

define_method(:foo) { <<~FOO }; define_method(:bar) { <<~BAR }
  foo content
FOO
  bar content
BAR

method(:foo).source_range # => 
# { <<~FOO }

method(:bar).source_range # =>
# { <<~BAR }

is that correct?

Updated by Eregon (Benoit Daloze) 10 days ago Actions #14 [ruby-core:125366]

headius (Charles Nutter) wrote in #note-12:

The target of the break is a semantic detail, not a syntactic one, and it is not determined at parse time.

It does, it is determined at parse time. Break is a syntactic construct.

I'll turn this question around: how can I get just the range of the block syntactic element if it includes all this other stuff unrelated to the block?

We can easily use Prism.find(Method|UnboundMethod|Proc).block for this.
Similar if one wants to get the end position including the heredoc, Prism.find is the way too.

BTW another interesting case is:

for i in enum
  p i
  abc
end

There we must include at least from the i just after for as that's the block's parameters.
What both Prism and RubyVM::AST do in that case is include from the beginning of for, which makes sense.
If one just wants the body without parameters, they can use Prism.find(...).statements.
The only choice for Prism.find there is to return the ForNode, and so it is consistent to return a CallNode for proc/lambda/foo { ... } and a LambdaNode for -> { ... }.

Another reason is returning the CallNode there is useful for many cases, and having to walk the AST again to find it seems needlessly inefficient.
It's the same philosophy as the heredoc end: make it easy to locate the node with Prism, and access sub-parts as wanted from there.

Updated by Eregon (Benoit Daloze) 10 days ago Actions #15

Updated by headius (Charles Nutter) 10 days ago Actions #16 [ruby-core:125371]

The only choice for Prism.find there is to return the ForNode

for syntax does not parse as a ForNode without the block, so the two are not separable. Calls do parse without the block and remain CallNode. A block is an argument to a call. Therefore the block and the call are separate constructs that should have separate ranges.

Why include the CallNode range in the block's range but not in the range of other arguments? It is even less bound to the syntax of the call than the arguments are. It seems arbitrary to include the call's range in the block when it is not included in the range of any other argument or in the range of the call's receiver expression.

Updated by headius (Charles Nutter) 10 days ago Actions #17 [ruby-core:125372]

If I'm understanding you correctly, then what you would like for source_range for the example for foo and bar methods is then:

I don't claim to have any suggestion for HERE docs because of the disjoint problem you mentioned. But clearly the range should not include the first character of the call to which the HERE doc is being passed or the block of code in which it is contained.

My point is that including the call range in the source range of a block passed to it doesn't make any more sense syntactically than including the call range in the source range of its other arguments. In both cases, the argument or block could be removed from the call and it would still parse as a call.

Updated by tompng (tomoya ishida) 9 days ago Actions #18 [ruby-core:125374]

Proc's source in the code below consists of 3 parts. do, inside\n<<B; end, and exclave\nB.

<<A; tap do
outside injected
A
inside
<<B; end; outside
exclave
B

Even if heredocs are used, I think do is the start position and end is the end position.
Source ranges may have several segments and may have exclave part that exceeds end position. Some code outside of proc may be injected into start_pos...end_pos with heredocs.
Injected part may contain any characters(parentheses, keywords, operators, etc) that prevents analyzing source string as a code.

By the way, IRB's show-source is not working well with the code above because IRB just simply finds syntax-valid lines[start_line..start_line+i] lines.
It's impossible to correctly implement show-source only from start_line for code like this:

def f; end; def f; def f
end
end

Updated by Eregon (Benoit Daloze) 8 days ago Actions #19 [ruby-core:125381]

I was curious what's the reported location for ArgumentErrors caused by lambda parameters and how ErrorHighlight highlights them.

For

l = lambda { |x, y|
  x + y
}

l.call(1)

With parse.y, the node (RubyVM::AbstractSyntaxTree.of(e.backtrace_locations[0])) is actually:

(ITER@1:4-3:1 (FCALL@1:4-1:10 :lambda nil)
   (SCOPE@1:11-3:1
    tbl: [:x, :y]
    args:
      (ARGS@1:14-1:18
       ...

which covers from lambda { |x, y| to }.

With Prism, the node (Prism node with node_id == RubyVM::AbstractSyntaxTree.node_id_for_backtrace_location(e.backtrace_locations[0])) is actually:

BlockNode (location: (1,11)-(3,1))
├── locals: [:x, :y]
├── parameters:
│   @ BlockParametersNode (location: (1,13)-(1,19))
│   ├── parameters:
│   │   @ ParametersNode (location: (1,14)-(1,18))
│   │   ├── requireds: (length: 2)
│   │   │   ├── @ RequiredParameterNode (location: (1,14)-(1,15))
│   │   │   │   └── name: :x
│   │   │   └── @ RequiredParameterNode (location: (1,17)-(1,18))
│   │   │       └── name: :y
│   │   ├── ...
│   ├── opening_loc: (1,13)-(1,14) = "|"
│   └── closing_loc: (1,18)-(1,19) = "|"
...

which covers from { |x, y| to }.

ErrorHighlight shows:

test.rb:1:in 'block in <main>': wrong number of arguments (given 1, expected 2) (ArgumentError)

    caller: test.rb:5
    | l.call(1)
       ^^^^^
    callee: test.rb:1
    | l = lambda { |x, y|
                 ^

ErrorHighlight makes the choice (uses node.children[1] with RubyVM::AST) to highlight the { and not lambda { and I agree that makes sense since it's the block (parameters) causing the exception, not the call to lambda.

Maybe it could even highlight |x, y| or x, y to pinpoint that's it about the wrong arguments vs parameters by using the ARGS/BlockParametersNode/ParametersNode.


Similarly, for define_method with a block:

define_method :define_method_test do |x, y|
  x + y
end

define_method_test(1)

The parse.y node is:

#<RubyVM::AbstractSyntaxTree::Node:ITER@1:0-3:3>

which covers from define_method :define_method_test do |x, y| to end.

The Prism node is:

BlockNode (location: (1,34)-(3,3))

which covers from do |x, y| to end.

ErrorHighlight shows:

test.rb:1:in 'block in <main>': wrong number of arguments (given 1, expected 2) (ArgumentError)

    caller: test.rb:5
    | define_method_test(1)
      ^^^^^^^^^^^^^^^^^^
    callee: test.rb:1
    | define_method :define_method_test do |x, y|
                                        ^^

In conclusion if we look at the returned node start/end line/column we see RubyVM::AST returns the full call and Prism returns just the block. Both work.

From the experiment in https://github.com/eregon/error_highlight/pull/1 it would be best if both match.

Updated by Eregon (Benoit Daloze) 8 days ago Actions #20 [ruby-core:125382]

For

l = -> x, y {
  x + y
}

l.call(1)

ErrorHighlight shows:

test.rb:1:in 'block in <main>': wrong number of arguments (given 1, expected 2) (ArgumentError)

    caller: test.rb:5
    | l.call(1)
       ^^^^^
    callee: test.rb:1
    | l = -> x, y {
          ^^
	from test.rb:5:in '<main>'

So in this case it highlights the ->, so maybe the strategy is to highlight the token before block parameters.
For consistency with the lambda { |x, y| case maybe it'd make sense to highlight lambda {, but I think either way it's clear enough.

I think both {/do or the start of the call which receives the block (proc {) are valid start position and each have pros & cons.
We just need to make a choice and we should be consistent for the start position of a node from an arity ArgumentError with both --parser=prism and --parser=parse.y.

Updated by Eregon (Benoit Daloze) 6 days ago Actions #21 [ruby-core:125390]

Eregon (Benoit Daloze) wrote in #note-10:

I'm not sure if @matz (Yukihiro Matsumoto) made his mind on: Start position: from the p of proc? Or from the {?

I confirmed with both @mame (Yusuke Endoh) and @matz (Yukihiro Matsumoto), and matz said it should be from p of proc.

Updated by headius (Charles Nutter) 6 days ago Actions #22 [ruby-core:125391]

Eregon (Benoit Daloze) wrote in #note-21:

I confirmed with both @mame (Yusuke Endoh) and @matz (Yukihiro Matsumoto), and matz said it should be from p of proc.

So that means it will include the original call and all arguments in all cases, including multi-line calls with large expressions, correct?

Actions

Also available in: PDF Atom