Bug #14126: Recent parse.y (Ripper) changes - lexing, tokenizing - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14126

closed

Recent parse.y (Ripper) changes - lexing, tokenizing

Added by MSP-Greg (Greg L) over 7 years ago. Updated over 7 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 2.5.0dev (2017-11-22 trunk 60878) [x64-mingw32]

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN

[ruby-core:83864]

Description

First of all, I'd like to thank @yui-knk (Kaneko Yuichiro) for all the work on parse.y. I assume some of it is due the movement of RDoc from 'seattlerb' to 'ruby', along with RDoc now using Ripper instead of its own parser.

I'm a YARD user. Recent commits have broken some of YARD's parsing code, although many of the commits actually fixed odd behavior in Ripper. I did find one thing that seems odd.

It centers on whether Ripper.tokenize(src).join('') == src or Ripper.tokenize(src).join('').length == src.length should be true. I believe the actual issue for YARD is the following constraint:

src == Ripper.lex(src).each { |t| combined << t[2] }

Using the listed code, svn 60863 shows true for every source string, but 60878 shows false. The extra white-space content that appears in the :on_tstring_content members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the :on_words_sep (or :on_qwords_beg) members.

# frozen_string_literal: true

require 'ripper'
require 'pp'

module RipperPercent

  def self.run
      output "%w(\n  AA\n  BB\n  CC\n  DD\n)"
      output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)"
      output "%w(\n  AA  BB  CC  DD\n)"
  end

  def self.output(s)
    combined = ''.dup
    Ripper.lex(s).each { |t| combined << t[2] }
    puts
    puts "src    #{s.gsub("\n", "\\n")}"
    puts "lexed  #{combined.gsub("\n", "\\n")}"
    puts "src == lexed is #{s == combined}"

    # puts ; pp Ripper.lex(s)
    # puts Ripper.tokenize(s).inspect
    # pp Ripper.sexp_raw(s)
  end
end
RipperPercent.run

As mentioned previously, I'm not much of a c type, and much of Ripper is not doc'd very well. Hence, I don't think I can fix this, if indeed it's an issue. I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'.

Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented?

Thanks, Greg

Actions

Copy link

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Status changed from Open to Closed

Applied in changeset trunk|r60883.

ripper.y: fix word list events

parse.y (parser_skip_words_sep): QWORDS_BEG should not include
the first separators in ripper.
parse.y (parser_parse_string): WORDS_SEP should not include
the closing parentheses of a word list in ripper, should include
spaces at beginning of lines. [ruby-core:83864] [Bug #14126]

Actions

Copy link

#2 [ruby-core:83871]

Updated by MSP-Greg (Greg L) over 7 years ago

@nobu (Nobuyoshi Nakada)

Thank you for the patch, as the lex array looks as I would think it should (I'm not that familiar with parsers.)

Using 60884, Ripper.sexp_raw and Ripper.sexp now return nil for all three strings in the above code. They both 'worked' using 60863 and 60875.

I've also got an error in YARD's parsing of syntax error, unexpected tSTRING_CONTENT, expecting tSTRING_END using the following input:

YARD's parser mostly hooks into Rippers events; I think the error is actually raised by Ripper. Not sure, as I've spent more time with YARD c parser than its ruby parser...

Thanks, Greg

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14126

Recent parse.y (Ripper) changes - lexing, tokenizing

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Updated by MSP-Greg (Greg L) over 7 years ago