Project

General

Profile

Actions

Feature #21005

open

Update the source location method to include line start/stop and column start/stop details

Added by bkuhlmann (Brooke Kuhlmann) 2 days ago. Updated about 2 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:120488]

Description

Why

πŸ‘‹ Hello. After discussing with Kevin Newton and Benoit Daloze in Feature 20999, I'd like to propose adding line start/stop and column start/stop information to the #source_location method for the following objects:

At the moment, when using #source_location, you only get the following information:

def demo = "A demonstration."

# From disk.
method(:demo).source_location  # ["/Users/bkuhlmann/Engineering/Misc/demo", 15]

# From memory.
method(:demo).source_location  # ["(irb)", 3]

Notice, when asking for the source location, we only get the path/location as the first element and the line number as the second element but I'd like to obtain a much richer set of data which includes line start/stop and column start/stop so I can avoid leaning on the RubyVM for this information. Example:

def demo = "A demonstration."

# From disk.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts [instructions.absolute_path, *instructions.to_a.dig(4, :code_location)]

[
  "/Users/bkuhlmann/Engineering/Misc/demo",  # Source path.
  15,                                        # Line start.
  0,                                         # Column start.
  15,                                        # Line stop.
  29                                         # Column stop.
]

# From memory.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts instructions.script_lines

[
  "def demo = \"A demonstration.\"\n",
  ""
]

By having access to the path (or lack thereof in case of IRB), line start/stop, and column start/stop, this means we could avoid using the RubyVM to obtain raw source code for any of these objects. This would not only enhance debugging situations but also improve Domain Specific Languages that wish to leverage this information for introducing new features and/or new debugging capabilities to the language.

How

Building upon the examples provided above, I'd like to see Binding, Proc, Method, and UnboundMethod respond to #source_location as follows:

[
  "/Users/bkuhlmann/Engineering/Misc/demo",  # Source path.
  15,                                        # Line start.
  15,                                        # Line stop.
  0,                                         # Column start.
  29                                         # Column stop.
]

Notice, for data grouping purposes, I changed the array structure to always start with the path as the first element, followed by line information, and ending with column information. Alternatively, it could might be nice to improve upon the above by answering a hash each time, instead, for a more self-describing data structure. Example:

{
  path: "/Users/bkuhlmann/Engineering/Misc/demo",
  line_start: 15,
  line_stop: 15,
  column_start: 0,
  column_stop: 29
}

For in-memory, situations like IRB, it would be nice to answer the equivalent of RubyVM::InstructionSequence#script_lines which would always be an Array with no line or column information since only the source code is necessary. Example:

[
  "def demo = \"A demonstration.\"\n",
  ""
]

From a pattern matching perspective, this could provide the best of both worlds especially if information is answered as either a Hash or and Array. Example:

def demo = "A demonstration."

case method(:demo).source_location
  in Hash then puts "Source information obtained from disk."
  in Array then puts "Source obtained from memory."
  else fail TypeError, "Unrecognized source location type."
end

This above is only a simple example but there's a lot we could do with this information if the above pattern match was enhanced to deal with the extraction and formatting of the actual source code!

Notes

This feature request is related to the following discussions in case more context is of help:


Related issues 2 (1 open1 closed)

Related to Ruby master - Feature #20999: Add RubyVM object source supportRejectedActions
Related to Ruby master - Feature #6012: Proc#source_location also return the columnAssignednobu (Nobuyoshi Nakada)Actions
Actions #1

Updated by Eregon (Benoit Daloze) 2 days ago

Actions #2

Updated by Eregon (Benoit Daloze) 2 days ago

  • Related to Feature #6012: Proc#source_location also return the column added

Updated by Eregon (Benoit Daloze) 1 day ago Β· Edited

I think adding end line and start/end columns is a straightforward and compatible extension of source_location.
I will add that to the dev meeting agenda: https://bugs.ruby-lang.org/issues/20949

Changing to return a Hash (or an Array of 2 Strings) instead would be too incompatible, that would need to be a new method then.
If it's a new method, I think we should return a "code location" object (could be Ruby::CodeLocation or Ruby::Location or Ruby::SourceLocation or so) and have the following methods (inspired from https://bugs.ruby-lang.org/issues/6012#note-19):

  • start_line
  • start_column
  • start_offset
  • end_line
  • end_column
  • end_offset
  • code or maybe source_code: gets the source of the Proc/Method/UnboundMethod if available, nil otherwise

That last one seems particularly nice because it abstracts how it gets that source code, and allows for multiple implementations.
For instance if the Ruby implementation keeps the source code in memory (or a compressed version of it) it could just use that, if it doesn't it would re-read from disk, etc.

Updated by bkuhlmann (Brooke Kuhlmann) 1 day ago Β· Edited

Yeah, eager to see what the consensus on this becomes from the next dev meeting.

I like the idea of a Ruby::SourceLocation or even a Ruby::Source object? The latter would be nice because you could message the instance as source.code for situations in which the source could be obtained from memory, re-read from disk, or answer nil as you point out. Super helpful for situations where you're using IRB or Binding.

Updated by mame (Yusuke Endoh) about 22 hours ago

I read the use case for #20999.

In short, by the following DSL,

class Demo
  include Initable[[:key, :default, proc { Object.new }]]
end

you want to generate the following code string, right?

def initialize(default = Object.new)
  @default = default
end

I believe that the last position of Proc is not sufficient for your case, because of a here document. Consider the following proc.

proc { <<END }
  heredoc
END

Where do you expect the end position of the Proc to be? The end brace? Or the end of β€œEND” delimiter? Either way, your DSL will generate a broken Ruby code string.

What you really need is not the end position information, but an AST node of Method, Proc, etc.

f = proc { <<END }
  heredoc
END

# What you really need
node = Prism.node_for(f)

# You can get all the locations including the here document
pp node.block.body.body
# =>
# [@ StringNode (location: (1,7)-(1,12))
#  β”œβ”€β”€ flags: newline
#  β”œβ”€β”€ opening_loc: (1,7)-(1,12) = "<<END"
#  β”œβ”€β”€ content_loc: (2,0)-(3,0) = "  heredoc\n"
#  β”œβ”€β”€ closing_loc: (3,0)-(3,3) = "END"
#  └── unescaped: "  heredoc\n"]

# You can even get the end position of the proc, if it is really sufficient for you
p [node.location.end_line, node.location.end_column] #=> [1, 14]

Updated by byroot (Jean Boussier) about 19 hours ago

I think adding end line and start/end columns is a straightforward and compatible extension of source_location.

I fear it may break code that splats the array, e.g.:

some_method(*proc.source_location)

So perhaps it should use an argument, e.g. proc.source_location(true) ?

I like the idea of a Ruby::SourceLocation

Me too, because otherwise it would be an array with many positional numbers, and that would be quite cryptic. Returning an actual object with named methods make it easier to understand what's being used and easier to extend later on.

Updated by bkuhlmann (Brooke Kuhlmann) about 15 hours ago Β· Edited

Yusuke: I went hunting for the Prism.node_for method/documentation but couldn't find it. If I understand you correctly, I think you are proposing adding the .node_for implementation to Prism? If so, I like the detailed information in your example especially when dealing with heredocs and being able to access content location. That would yield even greater flexibility for anyone implementing atop this data. To be more exact, I'm guessing the following could be amended to what Benoit proposed above then?:

  • content_line_start
  • content_line_end
  • content_column_start
  • content_column_end

By the way, one slight correction to what you were asking above. This is what my DSL generates:

# DSL
class Demo
  include Initable[[:key, :default, proc { Object.new }]]
end

# End result.
class Demo
  def initialize(default: Object.new)
    @default = default
  end

  private
  
  attr_reader :default
end

The emphasis is on key because key is an optional keyword parameter as defined in the Method#parameters implementation. Essentially, my DSL reconstitutes the raw parameters answered back into a method signature where the third element in the 3-tuple is additional sugar (specific to my DSL) for providing a default value since Method#parameters only answers an array of tuples.

Jean

I fear it may break code that splats the array

Ugh, yes. What about this instead?

proc.source_location(as: :verbose) 

This would give us room to expand the as keyword with different values to support different object shapes in the future (assuming this design expands and grows in future Ruby versions) without backing us into a corner.

Returning an actual object with named methods make it easier to understand what's being used and easier to extend later on.

Yeah, this would be most welcome. I know I gave an example of using a Hash above but icing on the cake would be to answer a Data object instead so you could simply ask for whatever attribute you need.

Updated by byroot (Jean Boussier) about 11 hours ago

Ugh, yes. What about [keyword argument] instead?

I think it may be a bit much.

But if we decide with returning a new class, instead of just retunring a longer array, I think the better API would be another method entirely.

Perhaps #source, #source_info, or something along these lines.

Updated by Earlopain (Earlopain _) about 10 hours ago

But if we decide with returning a new class, instead of just retunring a longer array, I think the better API would be another method entirely.

Yes, I agree with that. caller and caller_location are also different methods. I don't think source would be a good name, it could just as easily return the source string. source_info definitly sounds better to me.

Updated by tenderlovemaking (Aaron Patterson) about 6 hours ago

It seems like a very similar solution to what @mame (Yusuke Endoh) is proposing was merged at one point as Feature #14836. It seems RubyVM::AST doesn't exist anymore, but it should be possible to do something similar.

For example, this code should work:

f = proc { <<END }
  heredoc
END

iseq = RubyVM::InstructionSequence.of(f)

require "prism"

node_id = iseq.to_a[4][:node_id]

ast = Prism.parse(File.binread(__FILE__))
p ast.value.breadth_first_search { |node| node.node_id == node_id }

It does not currently work due to a bug in Prism, but I'm trying to address that in #21014

Updated by tenderlovemaking (Aaron Patterson) about 6 hours ago

Also just to be clear, I don't think the code I pasted above is nice code, but it does work after the Prism bug is addressed.

In particular this line is pretty heavy:

node_id = iseq.to_a[4][:node_id]

I think we should add a reader method on RubyVM::InstructionSequence to get the node id. It already has some methods for accessing location information, so I think adding the node id could be acceptable. However, I think such an addition is outside the scope of this ticket, so I will open a new ticket.

Updated by mame (Yusuke Endoh) about 5 hours ago

I think you are proposing adding the .node_for implementation to Prism?

Yes.

To be more exact, I'm guessing the following could be amended to what Benoit proposed above then?

Prism.node_for is my proposal, but the others are not my proposal but what Prism already provides.

require "prism"
node = Prism.parse("proc { <<END }
  heredoc
END")

pp node
$ ruby test.rb
#<Prism::ParseResult:0x00007f17ceafc278
 @comments=[],
 @data_loc=nil,
 @errors=[],
 @magic_comments=[],
 @source=#<Prism::ASCIISource:0x00007f17cdd4a2d8 @offsets=[0, 15, 25], @source="proc { <<END }\n  heredoc\nEND", @start_line=1>,
 @value=
  @ ProgramNode (location: (1,0)-(1,14))
  β”œβ”€β”€ flags: βˆ…
  β”œβ”€β”€ locals: []
  └── statements:
      @ StatementsNode (location: (1,0)-(1,14))
      β”œβ”€β”€ flags: βˆ…
      └── body: (length: 1)
          └── @ CallNode (location: (1,0)-(1,14))
              β”œβ”€β”€ flags: newline, ignore_visibility
              β”œβ”€β”€ receiver: βˆ…
              β”œβ”€β”€ call_operator_loc: βˆ…
              β”œβ”€β”€ name: :proc
              β”œβ”€β”€ message_loc: (1,0)-(1,4) = "proc"
              β”œβ”€β”€ opening_loc: βˆ…
              β”œβ”€β”€ arguments: βˆ…
              β”œβ”€β”€ closing_loc: βˆ…
              └── block:
                  @ BlockNode (location: (1,5)-(1,14))
                  β”œβ”€β”€ flags: βˆ…
                  β”œβ”€β”€ locals: []
                  β”œβ”€β”€ parameters: βˆ…
                  β”œβ”€β”€ body:
                  β”‚   @ StatementsNode (location: (1,7)-(1,12))
                  β”‚   β”œβ”€β”€ flags: βˆ…
                  β”‚   └── body: (length: 1)
                  β”‚       └── @ StringNode (location: (1,7)-(1,12))
                  β”‚           β”œβ”€β”€ flags: newline
                  β”‚           β”œβ”€β”€ opening_loc: (1,7)-(1,12) = "<<END"
                  β”‚           β”œβ”€β”€ content_loc: (2,0)-(3,0) = "  heredoc\n"
                  β”‚           β”œβ”€β”€ closing_loc: (3,0)-(3,3) = "END"
                  β”‚           └── unescaped: "  heredoc\n"
                  β”œβ”€β”€ opening_loc: (1,5)-(1,6) = "{"
                  └── closing_loc: (1,13)-(1,14) = "}",
 @warnings=[]>

tenderlovemaking (Aaron Patterson) wrote in #note-10:

It seems like a very similar solution to what @mame (Yusuke Endoh) is proposing was merged at one point as Feature #14836. It seems RubyVM::AST doesn't exist anymore, but it should be possible to do something similar.

Yeah, we need to reimplement Prism version of RubyVM::AbstractSyntaxTree.of. This is what I tried to mean by Prism.node_for.

tenderlovemaking (Aaron Patterson) wrote in #note-11:

I think we should add a reader method on RubyVM::InstructionSequence to get the node id.

I don't think it is good for a user to handle node_id explicitly.
Rather, I think Prism should provide a simple method to directly retrieve a node subtree of a Method/Proc object given, like RubyVM::AST.of.

Updated by tenderlovemaking (Aaron Patterson) about 2 hours ago

mame (Yusuke Endoh) wrote in #note-12:

tenderlovemaking (Aaron Patterson) wrote in #note-11:

I think we should add a reader method on RubyVM::InstructionSequence to get the node id.

I don't think it is good for a user to handle node_id explicitly.
Rather, I think Prism should provide a simple method to directly retrieve a node subtree of a Method/Proc object given, like RubyVM::AST.of.

I agree. The problem is that RubyVM::AST is built in to Ruby, so it has access to rb_iseq_t members. Prism cannot access fields on rb_iseq_t.

My idea is to apply a patch like this to iseq.c: https://github.com/tenderlove/ruby/commit/9a54230012d8837a981e0ddec88384ab6ef4db89

Then add code to Prism like this:

require "prism"

# Put the following code in Prism
module Prism
  def self.ast_for iseq
    ast = Prism.parse(File.read(iseq.absolute_path))
    node_id = iseq.node_id
    ast.value.breadth_first_search { |node| node.node_id == node_id }
  end
end

# User code is below
f = proc { <<END }
  heredoc
END

iseq = RubyVM::InstructionSequence.of(f)
tree = Prism.ast_for(iseq)
p tree

I implemented the Prism method as a monkey patch just as demonstration. But the idea is the same as RubyVM::AbstractSyntaxTree.of, where Prism.ast_for takes an iseq object and returns the AST for that iseq.

Actions

Also available in: Atom PDF

Like1
Like0Like0Like2Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0