Project

General

Profile

Feature #17276

Ripper stops tokenizing after keyword as a method parameter

Added by no6v (Nobuhiro IMAI) about 1 month ago. Updated 11 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:100470]

Description

Although these are obviously syntax errors at this time, the following
codes cannot be tokenized correctly by Ripper.tokenize.

$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ ruby -rripper -vlne 'p Ripper.tokenize($_)' src.rb
ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux]
["def", " ", "req", "(", "true", ")"]
["def", " ", "opt", "(", "true", "=", "0", ")"]
["def", " ", "rest", "(", "*", "true", ")"]
["def", " ", "keyrest", "(", "**", "true", ")"]
["def", " ", "block", "(", "&", "true", ")"]
["->", "true", "{"]
["->", "true", "=", "0", "{"]
["->", "*", "true", "{"]
["->", "**", "true", "{"]
["->", "&", "true", "{"]

end and } are not shown in result.

This seems to prevent irb from determining the continuity of the input.
See: https://github.com/ruby/irb/issues/38

Updated by jeremyevans0 (Jeremy Evans) 13 days ago

  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
  • ruby -v deleted (ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux])
  • Tracker changed from Bug to Feature

Ripper records errors, but Ripper.tokenize and Ripper.lex cannot return them. Here's how you can handle errors with Ripper (for tokenize, lex is similar):

require 'ripper'
r = Ripper::Lexer.new('def req(true) end', 'a', 1)
p r.tokenize
# => ["def", " ", "req", "(", "true", ")"]
p r.errors
# => [#<Ripper::Lexer::Elem: on_parse_error@1:8:END: "true": syntax error, unexpected `true', expecting ')'>]

This is not a bug, it is a limitation of the API for Ripper.tokenize and Ripper.lex. Changing Ripper.tokenize and Ripper.lex to raise an exception is possible, but would break backwards compatibility.

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

Updated by Eregon (Benoit Daloze) 13 days ago

jeremyevans0 (Jeremy Evans) wrote in #note-1:

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

I agree it would be nice.

Do you think the same would be possible for Ripper.sexp/sexp_raw?
Currently it just returns nil if there is some error, which is unhelpful if one wants to know why it failed to lex/parse:

> Ripper.sexp('def n')
=> nil

Updated by jeremyevans0 (Jeremy Evans) 13 days ago

Eregon (Benoit Daloze) wrote in #note-2:

jeremyevans0 (Jeremy Evans) wrote in #note-1:

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

I agree it would be nice.

Do you think the same would be possible for Ripper.sexp/sexp_raw?

Yes, the same is possible with Ripper.sexp/sexp_raw. I've updated the pull request to handle those as well.

#4

Updated by jeremyevans (Jeremy Evans) 12 days ago

  • Status changed from Open to Closed

Applied in changeset git|cd0877a93e91fecb3066984b3fa2a762e6977caf.


Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}

Implements [Feature #17276]

Updated by no6v (Nobuhiro IMAI) 11 days ago

Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}

Implements [Feature #17276]

Thanks for your clarification and implementation.
(it seems that those two lines are same :)
https://github.com/ruby/ruby/blob/cd0877a93e91fecb3066984b3fa2a762e6977caf/test/ripper/test_lexer.rb#L150-L151

Ripper::Lexer#{lex,tokenize} seem to accept second or more calls to return the rest of code as tokens.

$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ cat l.rb
require "ripper"
lexer = Ripper::Lexer.new(ARGF.read)
until (tokens = lexer.tokenize).empty?
  p tokens
end
$ ruby l.rb src.rb
["def", " ", "req", "(", "true", ")"]
[" ", "end", "\n", "def", " ", "opt", "(", "true", "=", "0", ")"]
[" ", "end", "\n", "def", " ", "rest", "(", "*", "true", ")"]
[" ", "end", "\n", "def", " ", "keyrest", "(", "**", "true", ")"]
[" ", "end", "\n", "def", " ", "block", "(", "&", "true", ")"]
[" ", "end", "\n", "->", "true", "{"]
["}", "\n", "->", "true", "=", "0", "{"]
["}", "\n", "->", "*", "true", "{"]
["}", "\n", "->", "**", "true", "{"]
["}", "\n", "->", "&", "true", "{"]
["}", "\n"]

Ripper::Lexer#lex is as well. Concatenated those tokens is what I exactly wanted.
I would prefer Ripper.{lex,tokenize} returning fully parsed tokens.

Updated by no6v (Nobuhiro IMAI) 11 days ago

I would prefer Ripper.{lex,tokenize} returning fully parsed tokens.

pull request: https://github.com/ruby/ruby/pull/3791

Also available in: Atom PDF