Feature #17276
closedRipper stops tokenizing after keyword as a method parameter
Description
Although these are obviously syntax errors at this time, the following
codes cannot be tokenized correctly by Ripper.tokenize
.
$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ ruby -rripper -vlne 'p Ripper.tokenize($_)' src.rb
ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux]
["def", " ", "req", "(", "true", ")"]
["def", " ", "opt", "(", "true", "=", "0", ")"]
["def", " ", "rest", "(", "*", "true", ")"]
["def", " ", "keyrest", "(", "**", "true", ")"]
["def", " ", "block", "(", "&", "true", ")"]
["->", "true", "{"]
["->", "true", "=", "0", "{"]
["->", "*", "true", "{"]
["->", "**", "true", "{"]
["->", "&", "true", "{"]
end
and }
are not shown in result.
This seems to prevent irb
from determining the continuity of the input.
See: https://github.com/ruby/irb/issues/38
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
- Tracker changed from Bug to Feature
- ruby -v deleted (
ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux]) - Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
Ripper records errors, but Ripper.tokenize
and Ripper.lex
cannot return them. Here's how you can handle errors with Ripper (for tokenize, lex is similar):
require 'ripper'
r = Ripper::Lexer.new('def req(true) end', 'a', 1)
p r.tokenize
# => ["def", " ", "req", "(", "true", ")"]
p r.errors
# => [#<Ripper::Lexer::Elem: on_parse_error@1:8:END: "true": syntax error, unexpected `true', expecting ')'>]
This is not a bug, it is a limitation of the API for Ripper.tokenize
and Ripper.lex
. Changing Ripper.tokenize
and Ripper.lex
to raise an exception is possible, but would break backwards compatibility.
Maybe we could support keyword arguments in Ripper.lex
and Ripper.tokenize
to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774
Updated by Eregon (Benoit Daloze) about 4 years ago
jeremyevans0 (Jeremy Evans) wrote in #note-1:
Maybe we could support keyword arguments in
Ripper.lex
andRipper.tokenize
to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774
I agree it would be nice.
Do you think the same would be possible for Ripper.sexp/sexp_raw
?
Currently it just returns nil
if there is some error, which is unhelpful if one wants to know why it failed to lex/parse:
> Ripper.sexp('def n')
=> nil
Updated by jeremyevans0 (Jeremy Evans) about 4 years ago
Eregon (Benoit Daloze) wrote in #note-2:
jeremyevans0 (Jeremy Evans) wrote in #note-1:
Maybe we could support keyword arguments in
Ripper.lex
andRipper.tokenize
to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774I agree it would be nice.
Do you think the same would be possible for
Ripper.sexp/sexp_raw
?
Yes, the same is possible with Ripper.sexp/sexp_raw
. I've updated the pull request to handle those as well.
Updated by jeremyevans (Jeremy Evans) about 4 years ago
- Status changed from Open to Closed
Applied in changeset git|cd0877a93e91fecb3066984b3fa2a762e6977caf.
Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}
Implements [Feature #17276]
Updated by no6v (Nobuhiro IMAI) about 4 years ago
Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}
Implements [Feature #17276]
Thanks for your clarification and implementation.
(it seems that those two lines are same :)
https://github.com/ruby/ruby/blob/cd0877a93e91fecb3066984b3fa2a762e6977caf/test/ripper/test_lexer.rb#L150-L151
Ripper::Lexer#{lex,tokenize}
seem to accept second or more calls to return the rest of code as tokens.
$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ cat l.rb
require "ripper"
lexer = Ripper::Lexer.new(ARGF.read)
until (tokens = lexer.tokenize).empty?
p tokens
end
$ ruby l.rb src.rb
["def", " ", "req", "(", "true", ")"]
[" ", "end", "\n", "def", " ", "opt", "(", "true", "=", "0", ")"]
[" ", "end", "\n", "def", " ", "rest", "(", "*", "true", ")"]
[" ", "end", "\n", "def", " ", "keyrest", "(", "**", "true", ")"]
[" ", "end", "\n", "def", " ", "block", "(", "&", "true", ")"]
[" ", "end", "\n", "->", "true", "{"]
["}", "\n", "->", "true", "=", "0", "{"]
["}", "\n", "->", "*", "true", "{"]
["}", "\n", "->", "**", "true", "{"]
["}", "\n", "->", "&", "true", "{"]
["}", "\n"]
Ripper::Lexer#lex
is as well. Concatenated those tokens is what I exactly wanted.
I would prefer Ripper.{lex,tokenize}
returning fully parsed tokens.
Updated by no6v (Nobuhiro IMAI) about 4 years ago
I would prefer
Ripper.{lex,tokenize}
returning fully parsed tokens.
pull request: https://github.com/ruby/ruby/pull/3791