Project

General

Profile

Feature #20257

Updated by yui-knk (Kaneko Yuichiro) 3 months ago

# Abstract 

 Rearchitect Ripper to provide whole semantic analysis support for Ripper and improve maintainability of Ripper. 
 This rearchitecture is achieved by modifying Lrama parser generator. 

 # Background and problem 

 Ripper is used for parsing ruby code, for example irb and rdoc use Ripper for parsing source codes. 
 Ripper and Ruby parser share the algorithm of parsing, however internal logic of Ripper is different from parser. 
 The differences cause three problems: 

 1. Ripper can not execute some semantic analysis. https://bugs.ruby-lang.org/issues/10436 is an example of this limitaion. `m(&nil) {}` raises syntax error but `Ripper.sexp("m(&nil) {}")` doesn't. 
 2. Ripper can not recognize regexp named capture. https://bugs.ruby-lang.org/issues/18988 is an example. 
 3. Makes prase.y complex. For example, the implementation of `new_array_pattern` is completely different between [parser](https://github.com/ruby/ruby/blob/1949a04f81311660e2d0ec002c48115c63742d0b/parse.y#L14960) and [ripper](https://github.com/ruby/ruby/blob/1949a04f81311660e2d0ec002c48115c63742d0b/parse.y#L1707). 

 These problems come from the fact parser and Ripper use semantic value stack differently. 
 Parser stores Node on the stack in many rules but Ripper stores Ruby Object returned by callback method. 
 Therefore Ripper can not execute semantic analysis which requires Node (#1). 
 Values on the stack are different then they need to implement same name functions differently (#3). 
 This leads different behavior like #2 because they have different `match_op` function. 

 # Proposal 

 Introduce new semantic value stack for Ripper so that Ripper can manage both Node and Ruby Object separately. 
 Lrama will provide some callback entry points and new special variable for actions. 

 Lrama will support these callback directives, specified function is called when the event happens 

 * %after-shift function_name 
 * %before-reduce function_name 
 * %after-reduce function_name 
 * %after-shift-error-token function_name 
 * %after-pop-stack function_name 

 Lrama also provides `$:n` variable to access index of each grammar symbols. The variable is translated to the minus index from the top of the stack. 
 For example 

 ```c 
 primary: k_if expr_value then compstmt if_tail k_end 
           { 
           /*% ripper: if!($:2, $:4, $:5) %*/ 
           /* $:2 = -5, $:4 = -3, $:5 = -2. */ 
           } 
 ``` 

 We can implement separated stack for Ruby Object by these features. 

 # Implementation note 

 ## New fields of struct parser_params 

 * `VALUE s_value`: Holds Ruby Object returned by Ripper callback method call. 
 * `VALUE s_lvalue`: Holds Ruby Object responding to LHS of the rule. 
 * `VALUE s_value_stack`: Stack for Ruby Object. It's actually ruby array. 

 These fields are added only when it's Ripper. 

 ## The role of callback functions 

 * %after-shift: Push `s_value` to `s_value_stack`. 
 * %before-reduce: Assign the first Ruby Object of RHS to `s_lvalue` (similar with `$$ = $1`). 
 * %after-reduce: Pop `s_value_stack` `rhs.len` times then push `s_lvalue` to `s_value_stack`. 
 * %after-shift-error-token: Push `nil` to `s_value_stack`. This `nil` stands for `error` token. 
 * %after-pop-stack: Pop `s_value_stack` `len` times. This is needed to emulate panic mode. 

 # Achievement 

 These bugs are fixed. 

 * [Bug 10436](https://bugs.ruby-lang.org/issues/10436) "ruby -c and ripper inconsistency: m(&nil) {}" 
 * [Bug 18988](https://bugs.ruby-lang.org/issues/18988) "Ripper cannot parse some code that has regexp named capture" 
 * [Bug 20055](https://bugs.ruby-lang.org/issues/20055) "Ripper seems to skip some checks like `void value expression` and `duplicated variable name`" 

 This means [all of Ripper open bugs tickets](https://bugs.ruby-lang.org/projects/ruby-master/issues?utf8=%E2%9C%93&set_filter=1&sort=id%3Adesc&f%5B%5D=subject&op%5Bsubject%5D=%7E&v%5Bsubject%5D%5B%5D=ripper&f%5B%5D=status_id&op%5Bstatus_id%5D=o&f%5B%5D=tracker_id&op%5Btracker_id%5D=%3D&v%5Btracker_id%5D%5B%5D=1&f%5B%5D=&c%5B%5D=tracker&c%5B%5D=status&c%5B%5D=subject&c%5B%5D=assigned_to&c%5B%5D=updated_on&group_by=) which are related current architecture will be closed. 

 # Links 

 * Lrama Implementaion: https://github.com/ruby/lrama/pull/367 
 * Ruby Implementaion: https://github.com/ruby/ruby/pull/9923 TBD 

Back