Feature #14844

Future of RubyVM::AST?

Added by rmosolgo (Robert Mosolgo) 8 months ago. Updated 26 days ago.

Target version:


Hi! Thanks for all your great work on the Ruby language.

I saw the new RubyVM::AST module in 2.6.0-preview2 and I quickly went to try it out.

I'd love to have a well-documented, user-friendly way to parse and manipulate Ruby code using the Ruby standard library, so I'm pretty excited to try it out. (I've been trying to learn Ripper recently, too:, .)

Based on my exploration, I opened a small PR on GitHub with some documentation:

I'm curious though, are there future plans for this module? For example, we might:

  • Add more details about each node (for example, we could expose the names of identifiers and operators through the node classes)
  • Document each node type

I see there is a lot more information in the C structures that we could expose, and I'm interested to help out if it's valuable. What do you think?


Updated by shevegen (Robert A. Heiler) 8 months ago

Just two short comments from me, so that I do not expand the issue request/question
too much.

(1) I would like to suggest to you to consider adding your question to the upcoming
developer meeting; it may be best to have the core team and other core contributors
comment on it, including matz and (based on RubyVM I think) koichi.

The next upcoming meeting agenda should be compiled here:

But again, this is just a suggestion from me - it is your issue/question so of
course you should decide. :)

(2) The other thing is, and I may be wrong, but I think there have been quite
a few introspection-related extensions to ruby in general, by different ruby
hackers/developers/core contributors. You mentioned RubyVM but I think there
are a few more examples; Martin in regards to unicode and ... I think the
regex engine (I forgot the particular issue but I remember it was somehow
related to being able to tweak more within the regex engine). MJIT is also
a bit related to more introspection, at the least indirectly, if we can
control how much we can optimize where and how, at the least in the future
(say, past ruby 3.0). matz mentioned one requirement/goal for mruby is in
regards to systems that are constrained, but with MJIT we may be able to
perhaps control more optimizations when it comes to systems that have
more RAM/cpu power/cores and so forth. So, I may be wrong, but I think
several changes in ruby are to some extent related to more introspection.
(The MJIT author blogged about this too, I think ... something about
inline-C or like that? I forgot the details right now, sorry.)

Of course it may be best to have koichi comment on relevant parts
of your question such as "a lot more information in the C structures
that we could expose" (and whether matz is fine with this; I think
matz is fine with it but it may be best to ask this in regards to
the developer meeting altogether).

I personally love introspection.

PS: Actually, I just remembered one more change that is a bit
related to introspection, at

added Binding#source_location. [Feature #14230]

While eval() could be used, I think Binding#source_location
has a nicer API.

Updated by ioquatix (Samuel Williams) 8 months ago

Rather than create a new issue, I want to comment here, as I've also been using the new RubyVM::AST to compute code coverage of templates (think ERB templates). In the past, the existing was not capable to parse a string, so being able to compute it by hand using RubyVM::AST is a great boon.

When using RubyVM::AST::Node#type, the return value is a string. I wondered if it would make more sense to return a symbol. Rather than a string like "NODE_SCOPE", perhaps return a symbol like :scope. It would make it easier and perhaps more efficient to traverse the AST, and it seems more Ruby-like.

Updated by ioquatix (Samuel Williams) 8 months ago

It would be nice to expose the class and method/function name if possible. In the case of code coverage, it would allow user to explicitly ignore functions, because we can traverse the AST and ignore every line of the function.

  • NODE_CLASS, NODE_MODULE expose class/module name.
  • NODE_DEFN expose method name.

Updated by ioquatix (Samuel Williams) 8 months ago

Here is what I made using RubyVM::AST. It was useful.

In the end I made a regular expression to match node type. It was better than I expected.

Updated by bozhidar (Bozhidar Batsov) 6 months ago

I'm really curious what's the purpose of this module and why wasn't in developed in collaboration with the maintainers of libraries like and, and the maintainers of prominent AST-based tools (e.g.

Seems to me really misguided to developed a module like this one without really discussing its design with any of its potential users and to quietly just ship it with Ruby 2.6.

That has been beyond frustrating for a very long time - a handful of people sit together somewhere and make all the shots, with the feedback from everyone often simply ignored. For any library to become really useful one should look at the problems people would normally be solving using it.

Updated by mame (Yusuke Endoh) 6 months ago

bozhidar (Bozhidar Batsov) wrote:

I'm really curious what's the purpose of this module and why wasn't in developed in collaboration with the maintainers of libraries like and, and the maintainers of prominent AST-based tools (e.g.

First of all, thank you for developing the parser gem and related tools including Rubocop.

As far as I understand, RubyVM module is completely different than other builtin modules. It exposes an access to Ruby internal for very limited purpose, such as debugging the internal, prototyping a new feature, implementing a MRI-bundled feature, etc. No compatibility is guaranteed at all; its API will arbitrarily change along with internal change. It is never intended for normal users to use it. Usefulness is not a priority for the modules under RubyVM. This fact can also be seen from RubyVM::InstructionSequence. (Its has a very long name, which also represents non-casual use, so it may be better to rename RubyVM::AST with AbstractSyntaxTree.)

In fact, Yuichiro Kaneko developped RubyVM::AST for testing a parser-related new feature, column number of each node, that he introduced in Ruby 2.5. It was originally a hidden external library. Some people (including me) requested to expose it as an internal-use module, and matz approved it. Then it is introduced into trunk as an experimental feature. Kaneko-san wanted RubyVM::AST for some study purpose (for example, finding all callsites of some function), and my motivation for the exposure is that it would be needed to import Ruby3's type system. (There are some proposals for type-checking Ruby. It is not decided which would be imported, or it is even uncertain if a AST module is really needed, but I just wanted to remove a blocker candidate in advance.)

RubyVM::AST does not decrease the value of the parser gem. RubyVM::AST is useful to investigate how MRI looks a Ruby program, but is not useful as a general Ruby program parser; it may optimize the AST by omitting some non-significant letters and restructuring the tree structure, so the result might not correspond to the original source code literally. I think that this property is not useful for Rubocop, for example.

In short, I think that RubyVM::AST is not what people expected. Users may use it just for research purpose, but must not use it in production.

Updated by ioquatix (Samuel Williams) 6 months ago

I hope this is relevant.

I found an interesting article here:

It describes the process Python uses.

It looks like it's standardised in a way that might be a useful goal for RubyVM::AST.

Updated by lucasbuchala (Lucas Buchala) 3 months ago

Hello. Sorry to step in with a frivolous suggestion, but I just noticed the name RubyVM::AbstractSyntaxTree in the RC1 announcement and was surprised by the length of the word when the shorter name "AST" would suffice.

mame (Yusuke Endoh) wrote:

[...] This fact can also be seen from RubyVM::InstructionSequence. (Its has a very long name, which also represents non-casual use, so it may be better to rename RubyVM::AST with AbstractSyntaxTree.)

Following this naming practice/convention is really necessary and beneficial?

Updated by ioquatix (Samuel Williams) 2 months ago

After playing around with the RC2 release, I think it's pretty good.

The only thing I'd like to see, is the use of Node instances even for cases where Array is currently used. Rather than Node.type, use explicit classes.

In addition to this rather than using Array for composite nodes (e.g. when parsing module), which is pretty fragile, classes specific to the node being parsed would be great, e.g. ModuleNode which has things like #name. Rather than an array which contains a symbol somewhere.

Updated by ioquatix (Samuel Williams) 26 days ago

I started playing around with parser gem, and I actually found it really great too. I think I'll use the parser gem going forward, since it works on many versions of Ruby. That being said, exposing Ruby's AST is also really fun.

Also available in: Atom PDF