I think this would be great to have, and abstract over implementation details like node_id.
It's also very powerful as e.g. Thread::Backtrace::Location#ast would be able to return a Prism::CallNode with all the relevant information, which error_highlight and others could then use very conveniently.
mame (Yusuke Endoh) wrote in #note-1:
The Implementation Approach¶
error_highlight accepts this fragility because it displays just "hints".
But I don't think that it is allowed for a built-in method. At least, we must avoid returning an incorrect node, and clarify when failures occur.
I think a built-in method doesn't imply it must work perfectly, it's e.g. not really possible to succeed when the file is modified (without keeping the source in memory).
I propose two approaches:
- Keep loaded source in memory (e.g.,
RubyVM.keep_script_lines = true by default). This supports eval but increase memory usage.
I think we could keep only eval code (IOW, non-file-based code) in memory, then the memory increase wouldn't be so big, and it would work as long as the files aren't modified.
Keeping all code in memory would be convenient, and interesting to measure how much an overhead it is (source code is often quite a compact representation actually), but I suspect given the general focus of CRuby on memory footprint it would be considered only as last resort.
- Validate source hash. Store a hash in the ISeq and check it to ensure the file hasn't changed.
This is a great idea, it should be reasonably fast and avoid the pitfall about modified files.
Although modified files are probably very rare in practice, so I'm not sure how much this matters, but it does seem nicer to fail properly than potentially return the wrong node.
The Parser Switching Problem¶
What is the node definition returned by #ast?
A Prism::Node because Matz has agreed that going forward the official parser API for Ruby will be the Prism API.
Actually more specific nodes where known:
-
Proc#ast: LambdaNode | CallNode | ForNode | DefNode (DefNode because Method#to_proc)
-
Method#ast & UnboundMethod#ast: DefNode | LambdaNode | CallNode | ForNode (block-related nodes because define_method(&Proc))
-
Thread::Backtrace::Location#ast & TracePoint#ast: Call*Node | Index*Node | YieldNode, and maybe a few more.
As noted in #21618, built-in Prism is not exposed as a Ruby API. If Gemfile.lock specifies an older version of prism gem, even require "prism" won't provide the expected definition.
This is basically a solved problem, as discussed there.
In that case, Prism.parse(foo, version: "current") fails with a clear exception explaining one needs to use a newer prism gem.
This only happens if one explicitly downgrades the Prism gem, which is expected to be pretty rare.
IMO, it would be good to have a node definition that does not depend on prism gem (maybe Ruby::Node?). I am not sure how much effort is needed for this. We would also need to consider where to place what in the ruby/prism and ruby/ruby repositories for development.
IMO there should only be Prism::Node, otherwise tools would have to switch between two APIs whether they want to use the current-running syntax or another syntax.
From the discussion in #21618 my take away is it's unnecessary to have a different API.
We also need to decide if #ast should return RubyVM::AST::Node when --parser=parse.y is specified.
It must not, the users of these new methods expect a Prism::Node.
Matz has said the official parser API for Ruby is the Prism API, so it doesn't make sense to return RubyVM::AST::Node there.
Also that wouldn't be reasonable when considering alternative Ruby implementations.
This does mean these methods wouldn't work with --parser=parse.y, until parse.y can be used to create a Prism::Node AST.
Since that's the official Ruby parsing API, it's already a goal anyway for parse.y to do that (#21825), so that shouldn't be a blocker.