Project

General

Profile

Actions

Feature #21795

open

Methods for retrieving ASTs

Feature #21795: Methods for retrieving ASTs

Added by kddnewton (Kevin Newton) about 8 hours ago. Updated 3 minutes ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:124311]

Description

I would like to propose a handful of methods for retrieving ASTs from various objects that correspond to locations in code. This includes:

  • Proc#ast
  • Method#ast
  • UnboundMethod#ast
  • Thread::Backtrace::Location#ast
  • TracePoint#ast (on call/return events)

The purpose of this is to make tooling easier to write and maintain. Specifically, this would be able to be used in irb, power_assert, error_highlight, and various other tools both in core and not that make use of source code.

There have been many previous discussions of retrieving node_id, source_location, source, etc. All of these use cases are covered by returning the AST for some entity. In this case node_id becomes an implementation detail, invisible to the user. Source location can be derived from the information on the AST itself. Similarly, source can be derived from the AST.

Internally, I do not think we have to store any more information than we already do (since we have node_id for the first four of these, it becomes rather trivial). For TracePoint we can have a larger discussion about it, but I think it should not be too much work. In terms of implementation, the only caveat I would put is that if the ISEQ were compiled through the old parser/compiler, this should return nil, as the node ids do not match up and we do not want to further propagate the RubyVM::AST API.

The reason I am opening up this ticket with 5 different methods requested in it is to get approval first for the direction, then I can open individual tickets or just PRs for each method. I believe this feature would ease the maintenance burden of many core libraries, and unify otherwise disparate efforts to achieve the same thing.

Updated by mame (Yusuke Endoh) 3 minutes ago Actions #1

I anticipated that we would consider this eventually, but incorporating it into the core presents significant challenges.

Here are two major issues regarding feasibility.

(Based on chats with @ko1 (Koichi Sasada), @tompng (tomoya ishida), and @yui-knk (Kaneko Yuichiro), though these are my personal views.)

The Implementation Approach

CRuby currently discards source code and ASTs after ISeq generation. The proposed #ast method would have to re-read and re-parse the source, which causes two problems:

  1. If the file is modified after loading, #ast may return the wrong node.
  2. It does not work for eval strings.

error_highlight accepts this fragility because it displays just "hints". But I don't think that it is allowed for a built-in method. At least, we must avoid returning an incorrect node, and clarify when failures occur.

I propose two approaches:

  1. Keep loaded source in memory (e.g., RubyVM.keep_script_lines = true by default). This supports eval but increase memory usage.
  2. Validate source hash. Store a hash in the ISeq and check it to ensure the file hasn't changed.

The Parser Switching Problem

What is the node definition returned by #ast?

As noted in #21618, built-in Prism is not exposed as a Ruby API. If Gemfile.lock specifies an older version of prism gem, even require "prism" won't provide the expected definition.

IMO, it would be good to have a node definition that does not depend on prism gem (maybe Ruby::Node?). I am not sure how much effort is needed for this. We would also need to consider where to place what in the ruby/prism and ruby/ruby repositories for development.

We also need to decide if #ast should return RubyVM::AST::Node when --parser=parse.y is specified.

Actions

Also available in: PDF Atom