Bug #14126
closedRecent parse.y (Ripper) changes - lexing, tokenizing
Description
First of all, I'd like to thank @yui-knk (Kaneko Yuichiro) for all the work on parse.y
. I assume some of it is due the movement of RDoc
from 'seattlerb' to 'ruby', along with RDoc
now using Ripper instead of its own parser.
I'm a YARD
user. Recent commits have broken some of YARD
's parsing code, although many of the commits actually fixed odd behavior in Ripper
. I did find one thing that seems odd.
It centers on whether Ripper.tokenize(src).join('') == src
or Ripper.tokenize(src).join('').length == src.length
should be true. I believe the actual issue for YARD is the following constraint:
src == Ripper.lex(src).each { |t| combined << t[2] }
Using the listed code, svn 60863 shows true for every source string, but 60878 shows false. The extra white-space content that appears in the :on_tstring_content
members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the :on_words_sep
(or :on_qwords_beg
) members.
# frozen_string_literal: true
require 'ripper'
require 'pp'
module RipperPercent
def self.run
output "%w(\n AA\n BB\n CC\n DD\n)"
output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)"
output "%w(\n AA BB CC DD\n)"
end
def self.output(s)
combined = ''.dup
Ripper.lex(s).each { |t| combined << t[2] }
puts
puts "src #{s.gsub("\n", "\\n")}"
puts "lexed #{combined.gsub("\n", "\\n")}"
puts "src == lexed is #{s == combined}"
# puts ; pp Ripper.lex(s)
# puts Ripper.tokenize(s).inspect
# pp Ripper.sexp_raw(s)
end
end
RipperPercent.run
As mentioned previously, I'm not much of a c type, and much of Ripper
is not doc'd very well. Hence, I don't think I can fix this, if indeed it's an issue. I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'.
Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented?
Thanks, Greg