Bug #14126: Recent parse.y (Ripper) changes - lexing, tokenizing - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #14126

closed

Recent parse.y (Ripper) changes - lexing, tokenizing

Added by MSP-Greg (Greg L) over 7 years ago. Updated over 7 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

ruby 2.5.0dev (2017-11-22 trunk 60878) [x64-mingw32]

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN

[ruby-core:83864]

Description

First of all, I'd like to thank @yui-knk (Kaneko Yuichiro) for all the work on parse.y. I assume some of it is due the movement of RDoc from 'seattlerb' to 'ruby', along with RDoc now using Ripper instead of its own parser.

I'm a YARD user. Recent commits have broken some of YARD's parsing code, although many of the commits actually fixed odd behavior in Ripper. I did find one thing that seems odd.

It centers on whether Ripper.tokenize(src).join('') == src or Ripper.tokenize(src).join('').length == src.length should be true. I believe the actual issue for YARD is the following constraint:

src == Ripper.lex(src).each { |t| combined << t[2] }

Using the listed code, svn 60863 shows true for every source string, but 60878 shows false. The extra white-space content that appears in the :on_tstring_content members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the :on_words_sep (or :on_qwords_beg) members.

# frozen_string_literal: true

require 'ripper'
require 'pp'

module RipperPercent

  def self.run
      output "%w(\n  AA\n  BB\n  CC\n  DD\n)"
      output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)"
      output "%w(\n  AA  BB  CC  DD\n)"
  end

  def self.output(s)
    combined = ''.dup
    Ripper.lex(s).each { |t| combined << t[2] }
    puts
    puts "src    #{s.gsub("\n", "\\n")}"
    puts "lexed  #{combined.gsub("\n", "\\n")}"
    puts "src == lexed is #{s == combined}"

    # puts ; pp Ripper.lex(s)
    # puts Ripper.tokenize(s).inspect
    # pp Ripper.sexp_raw(s)
  end
end
RipperPercent.run

As mentioned previously, I'm not much of a c type, and much of Ripper is not doc'd very well. Hence, I don't think I can fix this, if indeed it's an issue. I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'.

Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented?

Thanks, Greg

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #14126

Recent parse.y (Ripper) changes - lexing, tokenizing

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

Updated by MSP-Greg (Greg L) over 7 years ago