Project

General

Profile

Actions

Feature #17276

closed

Ripper stops tokenizing after keyword as a method parameter

Added by no6v (Nobuhiro IMAI) over 3 years ago. Updated over 3 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:100470]

Description

Although these are obviously syntax errors at this time, the following
codes cannot be tokenized correctly by Ripper.tokenize.

$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ ruby -rripper -vlne 'p Ripper.tokenize($_)' src.rb
ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux]
["def", " ", "req", "(", "true", ")"]
["def", " ", "opt", "(", "true", "=", "0", ")"]
["def", " ", "rest", "(", "*", "true", ")"]
["def", " ", "keyrest", "(", "**", "true", ")"]
["def", " ", "block", "(", "&", "true", ")"]
["->", "true", "{"]
["->", "true", "=", "0", "{"]
["->", "*", "true", "{"]
["->", "**", "true", "{"]
["->", "&", "true", "{"]

end and } are not shown in result.

This seems to prevent irb from determining the continuity of the input.
See: https://github.com/ruby/irb/issues/38

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

  • Tracker changed from Bug to Feature
  • ruby -v deleted (ruby 3.0.0dev (2020-10-21T00:24:47Z master da25affdac) [x86_64-linux])
  • Backport deleted (2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)

Ripper records errors, but Ripper.tokenize and Ripper.lex cannot return them. Here's how you can handle errors with Ripper (for tokenize, lex is similar):

require 'ripper'
r = Ripper::Lexer.new('def req(true) end', 'a', 1)
p r.tokenize
# => ["def", " ", "req", "(", "true", ")"]
p r.errors
# => [#<Ripper::Lexer::Elem: on_parse_error@1:8:END: "true": syntax error, unexpected `true', expecting ')'>]

This is not a bug, it is a limitation of the API for Ripper.tokenize and Ripper.lex. Changing Ripper.tokenize and Ripper.lex to raise an exception is possible, but would break backwards compatibility.

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

Updated by Eregon (Benoit Daloze) over 3 years ago

jeremyevans0 (Jeremy Evans) wrote in #note-1:

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

I agree it would be nice.

Do you think the same would be possible for Ripper.sexp/sexp_raw?
Currently it just returns nil if there is some error, which is unhelpful if one wants to know why it failed to lex/parse:

> Ripper.sexp('def n')
=> nil

Updated by jeremyevans0 (Jeremy Evans) over 3 years ago

Eregon (Benoit Daloze) wrote in #note-2:

jeremyevans0 (Jeremy Evans) wrote in #note-1:

Maybe we could support keyword arguments in Ripper.lex and Ripper.tokenize to raise SyntaxError for errors? Here's a pull request for that approach: https://github.com/ruby/ruby/pull/3774

I agree it would be nice.

Do you think the same would be possible for Ripper.sexp/sexp_raw?

Yes, the same is possible with Ripper.sexp/sexp_raw. I've updated the pull request to handle those as well.

Actions #4

Updated by jeremyevans (Jeremy Evans) over 3 years ago

  • Status changed from Open to Closed

Applied in changeset git|cd0877a93e91fecb3066984b3fa2a762e6977caf.


Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}

Implements [Feature #17276]

Updated by no6v (Nobuhiro IMAI) over 3 years ago

Support raise_errors keyword for Ripper.{lex,tokenize,sexp,sexp_raw}

Implements [Feature #17276]

Thanks for your clarification and implementation.
(it seems that those two lines are same :)
https://github.com/ruby/ruby/blob/cd0877a93e91fecb3066984b3fa2a762e6977caf/test/ripper/test_lexer.rb#L150-L151

Ripper::Lexer#{lex,tokenize} seem to accept second or more calls to return the rest of code as tokens.

$ cat src.rb
def req(true) end
def opt(true=0) end
def rest(*true) end
def keyrest(**true) end
def block(&true) end
->true{}
->true=0{}
->*true{}
->**true{}
->&true{}
$ cat l.rb
require "ripper"
lexer = Ripper::Lexer.new(ARGF.read)
until (tokens = lexer.tokenize).empty?
  p tokens
end
$ ruby l.rb src.rb
["def", " ", "req", "(", "true", ")"]
[" ", "end", "\n", "def", " ", "opt", "(", "true", "=", "0", ")"]
[" ", "end", "\n", "def", " ", "rest", "(", "*", "true", ")"]
[" ", "end", "\n", "def", " ", "keyrest", "(", "**", "true", ")"]
[" ", "end", "\n", "def", " ", "block", "(", "&", "true", ")"]
[" ", "end", "\n", "->", "true", "{"]
["}", "\n", "->", "true", "=", "0", "{"]
["}", "\n", "->", "*", "true", "{"]
["}", "\n", "->", "**", "true", "{"]
["}", "\n", "->", "&", "true", "{"]
["}", "\n"]

Ripper::Lexer#lex is as well. Concatenated those tokens is what I exactly wanted.
I would prefer Ripper.{lex,tokenize} returning fully parsed tokens.

Updated by no6v (Nobuhiro IMAI) over 3 years ago

I would prefer Ripper.{lex,tokenize} returning fully parsed tokens.

pull request: https://github.com/ruby/ruby/pull/3791

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0