Project

General

Profile

Actions

Feature #19013

closed

Error Tolerant Parser

Added by yui-knk (Kaneko Yuichiro) 3 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:109977]

Description

Background

Implementation for Language Server Protocol (LSP) sometimes needs to parse incomplete ruby script for example users want to complement expressions in the middle of statement like below:

class A
  def m
    a = 10
    if # here users want to run completion
  end
end

In such case, LSP implementation wants to get partial AST instead of syntax error.

Proposal

At the moment I want to propose 3 types of tolerance

1. Complement end when lexer hits to end-of-input but end is not enough

This is a case. Lexer will generate 1 end before generates end-of-input.

describe "1" do
  describe "2" do
    describe "3" do
      it "here" do
    end
  end
end

2. Extract "end" as keyword not identifier based on an indent

This is a case. Normal parser recognizes "end" on line 4 as "local variable or method".
This causes not only syntax error but also bar method definition is assumed as Z::Foo#bar.
Other approach is suppress !IS_lex_state(EXPR_DOT) checks for "end".

module Z
  class Foo
    foo.
  end

  def bar
  end
end

3. Change locations of error

Currently error is put into top_stmts and stmts like top_stmts: error top_stmt and stmts: error stmt.
However these are too strict to catch syntax error then want to move it to stmt: error and expr_value: error.

Interface

  • Adding error_tolerant option to RubyVM::AbstractSyntaxTree.parse
  • Adding --error-tolerant-parser option to ruby command for debugging
    • This option is valid only when –dump=yydebug, --dump=parsetree or --dump=parsetree_with_comment is passed

Compatibility

Changing the location of error can lead incompatibility. At least I observed 2 test cases in ruby/ruby are broken by this change.
I think both of them depend on how ripper behaves after ripper raises syntax error.

All other changes are related to not parser but lexer and they are controlled by error_tolerant option. Therefore no behavior change is expected for ruby parser and ripper.

Implementation

https://github.com/yui-knk/ruby/tree/error_recovery_indent_aware

Updated by duerst (Martin Dürst) 3 months ago

The topic of parsing incomplete syntax also came up in Kevin Newton's talk (see https://rubykaigi.org/2022/presentations/kddnewton.html) at RubyKaigi 2022. In the talk, he said he is working on a new parser. Maybe these efforts could be combined?

Updated by matz (Yukihiro Matsumoto) 3 months ago

Kevin's work has broader goals, e.g. being faster, consuming less memory, which should be free from yacc/bison limitation.
I consider this work as an experiment to explore error-tolerant-ness.

Matz.

Actions #3

Updated by yui-knk (Kaneko Yuichiro) about 2 months ago

  • Status changed from Open to Closed

Applied in changeset git|fbbdbdd8911ffb24d98bb71c7c33d24609ce7dfe.


Add error_tolerant option to RubyVM::AST

If this option is enabled, SyntaxError is not raised and Node is
returned even if passed script is broken.

[Feature #19013]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0