Feature #19070
closedEnhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Description
Background¶
Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both m(1)
and m(1, )
has same AST structure other than node locations then it's impossible to check the existence of ,
from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case.
Example¶
require "pp"
node = RubyVM::AbstractSyntaxTree.parse(<<~STR, keep_tokens: true)
def m(a, b = 1, *rest, &block)
end
m(1, )
STR
defn = node.children[2].children[0]
fcall = node.children[2].children[1]
puts "defn.tokens"
pp defn.tokens
puts "\n\n"
puts "fcall.tokens"
pp fcall.tokens
puts "\n\n"
puts defn.tokens.map{_1[2]}.join
puts fcall.tokens.map{_1[2]}.join
shows below, where token is [sequence_id, token_type, token_string, [first_line, first_column, last_line, last_column]]
defn.tokens
[[0, :kw, "def", [1, 0, 1, 3]],
[1, :sp, " ", [1, 3, 1, 4]],
[2, :ident, "m", [1, 4, 1, 5]],
[3, :lparen, "(", [1, 5, 1, 6]],
[4, :ident, "a", [1, 6, 1, 7]],
[5, :comma, ",", [1, 7, 1, 8]],
[6, :sp, " ", [1, 8, 1, 9]],
[7, :ident, "b", [1, 9, 1, 10]],
[8, :sp, " ", [1, 10, 1, 11]],
[9, :op, "=", [1, 11, 1, 12]],
[10, :sp, " ", [1, 12, 1, 13]],
[11, :int, "1", [1, 13, 1, 14]],
[12, :comma, ",", [1, 14, 1, 15]],
[13, :sp, " ", [1, 15, 1, 16]],
[14, :op, "*", [1, 16, 1, 17]],
[15, :ident, "rest", [1, 17, 1, 21]],
[16, :comma, ",", [1, 21, 1, 22]],
[17, :sp, " ", [1, 22, 1, 23]],
[18, :op, "&", [1, 23, 1, 24]],
[19, :ident, "block", [1, 24, 1, 29]],
[20, :rparen, ")", [1, 29, 1, 30]],
[21, :ignored_nl, "\n", [1, 30, 1, 31]],
[22, :kw, "end", [2, 0, 2, 3]]]
fcall.tokens
[[25, :ident, "m", [4, 0, 4, 1]],
[26, :lparen, "(", [4, 1, 4, 2]],
[27, :int, "1", [4, 2, 4, 3]],
[28, :comma, ",", [4, 3, 4, 4]],
[29, :sp, " ", [4, 4, 4, 5]],
[30, :rparen, ")", [4, 5, 4, 6]]]
def m(a, b = 1, *rest, &block)
end
m(1, )
Interface¶
- Add
keep_tokens
option forRubyVM::AbstractSyntaxTree.parse
,.parse_file
and.of
- Add
RubyVM::AbstractSyntaxTree::Node#tokens
which returns tokens for the node including tokens for descendants nodes. - Add
RubyVM::AbstractSyntaxTree::Node#all_tokens
which returns all tokens for the input script regardless the receiver node.
Implementation¶
Updated by Eregon (Benoit Daloze) 11 months ago
Doesn't Ripper.lex
already provide this information?
Updated by matz (Yukihiro Matsumoto) 11 months ago
Sounds OK.
Matz.
Updated by yui-knk (Kaneko Yuichiro) 10 months ago
- Status changed from Open to Closed
Applied in changeset git|d8601621edcf29e3323b90dcf04b774edd9fb45e.
Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
Implementation for Language Server Protocol (LSP) sometimes needs token information.
For example both m(1)
and m(1, )
has same AST structure other than node locations
then it's impossible to check the existence of ,
from AST. However in later case,
it might be better to suggest variables list for the second argument.
Token information is important for such case.
This commit adds these methods.
- Add
keep_tokens
option forRubyVM::AbstractSyntaxTree.parse
,.parse_file
and.of
- Add
RubyVM::AbstractSyntaxTree::Node#tokens
which returns tokens for the node including tokens for descendants nodes. - Add
RubyVM::AbstractSyntaxTree::Node#all_tokens
which returns all tokens for the input script regardless the receiver node.
[Feature #19070]
Impacts on memory usage and performance are below:
Memory usage:
$ cat test.rb
root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
11408kb
# keep_tokens :false
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
17508kb
# keep_tokens :true
$ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
30960kb
Performance:
$ cat ../ast_keep_tokens.yml
prelude: |
src = <<~SRC
module M
class C
def m1(a, b)
1 + a + b
end
end
end
SRC
benchmark:
without_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
with_keep_tokens: |
RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
$ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
/home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \
--output=markdown --output-compare -v ../ast_keep_tokens.yml
compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
warming up..
| |compare-ruby|built-ruby|
|:--------------------|-----------:|---------:|
|without_keep_tokens | 21.659k| 21.303k|
| | 1.02x| -|
|with_keep_tokens | 6.220k| 5.691k|
| | 1.09x| -|