Feature #21795
openMethods for retrieving ASTs
Description
I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes:
- Proc#ast
- Method#ast
- UnboundMethod#ast
- Thread::Backtrace::Location#ast
- TracePoint#ast (on call/return events)
The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code.
There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST.
Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return nil, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API.
The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing.
Updated by mame (Yusuke Endoh) 3 minutes ago
I anticipated that we would consider this eventually, but incorporating it into the core presents significant challenges.
Here are two major issues regarding feasibility.
(Based on chats with @ko1 (Koichi Sasada), @tompng (tomoya ishida), and @yui-knk (Kaneko Yuichiro), though these are my personal views.)
The Implementation Approach¶
CRuby currently discards source code and ASTs after ISeq generation. The proposed #ast method would have to re-read and re-parse the source, which causes two problems:
- If the file is modified after loading,
#astmay return the wrong node. - It does not work for
evalstrings.
error_highlight accepts this fragility because it displays just "hints". But I don't think that it is allowed for a built-in method. At least, we must avoid returning an incorrect node, and clarify when failures occur.
I propose two approaches:
- Keep loaded source in memory (e.g.,
RubyVM.keep_script_lines = trueby default). This supportsevalbut increase memory usage. - Validate source hash. Store a hash in the ISeq and check it to ensure the file hasn't changed.
The Parser Switching Problem¶
What is the node definition returned by #ast?
As noted in #21618, built-in Prism is not exposed as a Ruby API. If Gemfile.lock specifies an older version of prism gem, even require "prism" won't provide the expected definition.
IMO, it would be good to have a node definition that does not depend on prism gem (maybe Ruby::Node?). I am not sure how much effort is needed for this. We would also need to consider where to place what in the ruby/prism and ruby/ruby repositories for development.
We also need to decide if #ast should return RubyVM::AST::Node when --parser=parse.y is specified.