Feature #21005
openUpdate the source location method to include line start/stop and column start/stop details
Description
Why¶
π Hello. After discussing with Kevin Newton and Benoit Daloze in Feature 20999, I'd like to propose adding line start/stop and column start/stop information to the #source_location
method for the following objects:
At the moment, when using #source_location
, you only get the following information:
def demo = "A demonstration."
# From disk.
method(:demo).source_location # ["/Users/bkuhlmann/Engineering/Misc/demo", 15]
# From memory.
method(:demo).source_location # ["(irb)", 3]
Notice, when asking for the source location, we only get the path/location as the first element and the line number as the second element but I'd like to obtain a much richer set of data which includes line start/stop and column start/stop so I can avoid leaning on the RubyVM
for this information. Example:
def demo = "A demonstration."
# From disk.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts [instructions.absolute_path, *instructions.to_a.dig(4, :code_location)]
[
"/Users/bkuhlmann/Engineering/Misc/demo", # Source path.
15, # Line start.
0, # Column start.
15, # Line stop.
29 # Column stop.
]
# From memory.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts instructions.script_lines
[
"def demo = \"A demonstration.\"\n",
""
]
By having access to the path (or lack thereof in case of IRB), line start/stop, and column start/stop, this means we could avoid using the RubyVM to obtain raw source code for any of these objects. This would not only enhance debugging situations but also improve Domain Specific Languages that wish to leverage this information for introducing new features and/or new debugging capabilities to the language.
How¶
Building upon the examples provided above, I'd like to see Binding
, Proc
, Method
, and UnboundMethod
respond to #source_location
as follows:
[
"/Users/bkuhlmann/Engineering/Misc/demo", # Source path.
15, # Line start.
15, # Line stop.
0, # Column start.
29 # Column stop.
]
Notice, for data grouping purposes, I changed the array structure to always start with the path as the first element, followed by line information, and ending with column information. Alternatively, it could might be nice to improve upon the above by answering a hash each time, instead, for a more self-describing data structure. Example:
{
path: "/Users/bkuhlmann/Engineering/Misc/demo",
line_start: 15,
line_stop: 15,
column_start: 0,
column_stop: 29
}
For in-memory, situations like IRB, it would be nice to answer the equivalent of RubyVM::InstructionSequence#script_lines
which would always be an Array
with no line or column information since only the source code is necessary. Example:
[
"def demo = \"A demonstration.\"\n",
""
]
From a pattern matching perspective, this could provide the best of both worlds especially if information is answered as either a Hash
or and Array
. Example:
def demo = "A demonstration."
case method(:demo).source_location
in Hash then puts "Source information obtained from disk."
in Array then puts "Source obtained from memory."
else fail TypeError, "Unrecognized source location type."
end
This above is only a simple example but there's a lot we could do with this information if the above pattern match was enhanced to deal with the extraction and formatting of the actual source code!
Notes¶
This feature request is related to the following discussions in case more context is of help:
Updated by Eregon (Benoit Daloze) 2 days ago
- Related to Feature #20999: Add RubyVM object source support added
Updated by Eregon (Benoit Daloze) 2 days ago
- Related to Feature #6012: Proc#source_location also return the column added
Updated by Eregon (Benoit Daloze) 1 day ago Β· Edited
I think adding end line and start/end columns is a straightforward and compatible extension of source_location
.
I will add that to the dev meeting agenda: https://bugs.ruby-lang.org/issues/20949
Changing to return a Hash (or an Array of 2 Strings) instead would be too incompatible, that would need to be a new method then.
If it's a new method, I think we should return a "code location" object (could be Ruby::CodeLocation
or Ruby::Location
or Ruby::SourceLocation
or so) and have the following methods (inspired from https://bugs.ruby-lang.org/issues/6012#note-19):
- start_line
- start_column
- start_offset
- end_line
- end_column
- end_offset
- code or maybe source_code: gets the source of the Proc/Method/UnboundMethod if available,
nil
otherwise
That last one seems particularly nice because it abstracts how it gets that source code, and allows for multiple implementations.
For instance if the Ruby implementation keeps the source code in memory (or a compressed version of it) it could just use that, if it doesn't it would re-read from disk, etc.
Updated by bkuhlmann (Brooke Kuhlmann) 1 day ago Β· Edited
Yeah, eager to see what the consensus on this becomes from the next dev meeting.
I like the idea of a Ruby::SourceLocation
or even a Ruby::Source
object? The latter would be nice because you could message the instance as source.code
for situations in which the source could be obtained from memory, re-read from disk, or answer nil
as you point out. Super helpful for situations where you're using IRB or Binding
.
Updated by mame (Yusuke Endoh) about 22 hours ago
I read the use case for #20999.
In short, by the following DSL,
class Demo
include Initable[[:key, :default, proc { Object.new }]]
end
you want to generate the following code string, right?
def initialize(default = Object.new)
@default = default
end
I believe that the last position of Proc is not sufficient for your case, because of a here document. Consider the following proc.
proc { <<END }
heredoc
END
Where do you expect the end position of the Proc to be? The end brace? Or the end of βENDβ delimiter? Either way, your DSL will generate a broken Ruby code string.
What you really need is not the end position information, but an AST node of Method, Proc, etc.
f = proc { <<END }
heredoc
END
# What you really need
node = Prism.node_for(f)
# You can get all the locations including the here document
pp node.block.body.body
# =>
# [@ StringNode (location: (1,7)-(1,12))
# βββ flags: newline
# βββ opening_loc: (1,7)-(1,12) = "<<END"
# βββ content_loc: (2,0)-(3,0) = " heredoc\n"
# βββ closing_loc: (3,0)-(3,3) = "END"
# βββ unescaped: " heredoc\n"]
# You can even get the end position of the proc, if it is really sufficient for you
p [node.location.end_line, node.location.end_column] #=> [1, 14]
Updated by byroot (Jean Boussier) about 19 hours ago
I think adding end line and start/end columns is a straightforward and compatible extension of source_location.
I fear it may break code that splats the array, e.g.:
some_method(*proc.source_location)
So perhaps it should use an argument, e.g. proc.source_location(true)
?
I like the idea of a Ruby::SourceLocation
Me too, because otherwise it would be an array with many positional numbers, and that would be quite cryptic. Returning an actual object with named methods make it easier to understand what's being used and easier to extend later on.
Updated by bkuhlmann (Brooke Kuhlmann) about 15 hours ago Β· Edited
Yusuke: I went hunting for the Prism.node_for
method/documentation but couldn't find it. If I understand you correctly, I think you are proposing adding the .node_for
implementation to Prism? If so, I like the detailed information in your example especially when dealing with heredocs and being able to access content location. That would yield even greater flexibility for anyone implementing atop this data. To be more exact, I'm guessing the following could be amended to what Benoit proposed above then?:
- content_line_start
- content_line_end
- content_column_start
- content_column_end
By the way, one slight correction to what you were asking above. This is what my DSL generates:
# DSL
class Demo
include Initable[[:key, :default, proc { Object.new }]]
end
# End result.
class Demo
def initialize(default: Object.new)
@default = default
end
private
attr_reader :default
end
The emphasis is on key
because key
is an optional keyword parameter as defined in the Method#parameters implementation. Essentially, my DSL reconstitutes the raw parameters answered back into a method signature where the third element in the 3-tuple is additional sugar (specific to my DSL) for providing a default value since Method#parameters
only answers an array of tuples.
Jean
I fear it may break code that splats the array
Ugh, yes. What about this instead?
proc.source_location(as: :verbose)
This would give us room to expand the as
keyword with different values to support different object shapes in the future (assuming this design expands and grows in future Ruby versions) without backing us into a corner.
Returning an actual object with named methods make it easier to understand what's being used and easier to extend later on.
Yeah, this would be most welcome. I know I gave an example of using a Hash
above but icing on the cake would be to answer a Data
object instead so you could simply ask for whatever attribute you need.
Updated by byroot (Jean Boussier) about 11 hours ago
Ugh, yes. What about [keyword argument] instead?
I think it may be a bit much.
But if we decide with returning a new class, instead of just retunring a longer array, I think the better API would be another method entirely.
Perhaps #source
, #source_info
, or something along these lines.
Updated by Earlopain (Earlopain _) about 10 hours ago
But if we decide with returning a new class, instead of just retunring a longer array, I think the better API would be another method entirely.
Yes, I agree with that. caller
and caller_location
are also different methods. I don't think source
would be a good name, it could just as easily return the source string. source_info
definitly sounds better to me.
Updated by tenderlovemaking (Aaron Patterson) about 6 hours ago
It seems like a very similar solution to what @mame (Yusuke Endoh) is proposing was merged at one point as Feature #14836. It seems RubyVM::AST
doesn't exist anymore, but it should be possible to do something similar.
For example, this code should work:
f = proc { <<END }
heredoc
END
iseq = RubyVM::InstructionSequence.of(f)
require "prism"
node_id = iseq.to_a[4][:node_id]
ast = Prism.parse(File.binread(__FILE__))
p ast.value.breadth_first_search { |node| node.node_id == node_id }
It does not currently work due to a bug in Prism, but I'm trying to address that in #21014
Updated by tenderlovemaking (Aaron Patterson) about 6 hours ago
Also just to be clear, I don't think the code I pasted above is nice code, but it does work after the Prism bug is addressed.
In particular this line is pretty heavy:
node_id = iseq.to_a[4][:node_id]
I think we should add a reader method on RubyVM::InstructionSequence to get the node id. It already has some methods for accessing location information, so I think adding the node id could be acceptable. However, I think such an addition is outside the scope of this ticket, so I will open a new ticket.
Updated by mame (Yusuke Endoh) about 5 hours ago
I think you are proposing adding the .node_for implementation to Prism?
Yes.
To be more exact, I'm guessing the following could be amended to what Benoit proposed above then?
Prism.node_for
is my proposal, but the others are not my proposal but what Prism already provides.
require "prism"
node = Prism.parse("proc { <<END }
heredoc
END")
pp node
$ ruby test.rb
#<Prism::ParseResult:0x00007f17ceafc278
@comments=[],
@data_loc=nil,
@errors=[],
@magic_comments=[],
@source=#<Prism::ASCIISource:0x00007f17cdd4a2d8 @offsets=[0, 15, 25], @source="proc { <<END }\n heredoc\nEND", @start_line=1>,
@value=
@ ProgramNode (location: (1,0)-(1,14))
βββ flags: β
βββ locals: []
βββ statements:
@ StatementsNode (location: (1,0)-(1,14))
βββ flags: β
βββ body: (length: 1)
βββ @ CallNode (location: (1,0)-(1,14))
βββ flags: newline, ignore_visibility
βββ receiver: β
βββ call_operator_loc: β
βββ name: :proc
βββ message_loc: (1,0)-(1,4) = "proc"
βββ opening_loc: β
βββ arguments: β
βββ closing_loc: β
βββ block:
@ BlockNode (location: (1,5)-(1,14))
βββ flags: β
βββ locals: []
βββ parameters: β
βββ body:
β @ StatementsNode (location: (1,7)-(1,12))
β βββ flags: β
β βββ body: (length: 1)
β βββ @ StringNode (location: (1,7)-(1,12))
β βββ flags: newline
β βββ opening_loc: (1,7)-(1,12) = "<<END"
β βββ content_loc: (2,0)-(3,0) = " heredoc\n"
β βββ closing_loc: (3,0)-(3,3) = "END"
β βββ unescaped: " heredoc\n"
βββ opening_loc: (1,5)-(1,6) = "{"
βββ closing_loc: (1,13)-(1,14) = "}",
@warnings=[]>
tenderlovemaking (Aaron Patterson) wrote in #note-10:
It seems like a very similar solution to what @mame (Yusuke Endoh) is proposing was merged at one point as Feature #14836. It seems
RubyVM::AST
doesn't exist anymore, but it should be possible to do something similar.
Yeah, we need to reimplement Prism version of RubyVM::AbstractSyntaxTree.of
. This is what I tried to mean by Prism.node_for
.
tenderlovemaking (Aaron Patterson) wrote in #note-11:
I think we should add a reader method on RubyVM::InstructionSequence to get the node id.
I don't think it is good for a user to handle node_id
explicitly.
Rather, I think Prism should provide a simple method to directly retrieve a node subtree of a Method/Proc object given, like RubyVM::AST.of
.
Updated by tenderlovemaking (Aaron Patterson) about 2 hours ago
mame (Yusuke Endoh) wrote in #note-12:
tenderlovemaking (Aaron Patterson) wrote in #note-11:
I think we should add a reader method on RubyVM::InstructionSequence to get the node id.
I don't think it is good for a user to handle
node_id
explicitly.
Rather, I think Prism should provide a simple method to directly retrieve a node subtree of a Method/Proc object given, likeRubyVM::AST.of
.
I agree. The problem is that RubyVM::AST
is built in to Ruby, so it has access to rb_iseq_t
members. Prism cannot access fields on rb_iseq_t
.
My idea is to apply a patch like this to iseq.c: https://github.com/tenderlove/ruby/commit/9a54230012d8837a981e0ddec88384ab6ef4db89
Then add code to Prism like this:
require "prism"
# Put the following code in Prism
module Prism
def self.ast_for iseq
ast = Prism.parse(File.read(iseq.absolute_path))
node_id = iseq.node_id
ast.value.breadth_first_search { |node| node.node_id == node_id }
end
end
# User code is below
f = proc { <<END }
heredoc
END
iseq = RubyVM::InstructionSequence.of(f)
tree = Prism.ast_for(iseq)
p tree
I implemented the Prism method as a monkey patch just as demonstration. But the idea is the same as RubyVM::AbstractSyntaxTree.of
, where Prism.ast_for
takes an iseq object and returns the AST for that iseq.