Bug #14126
closedRecent parse.y (Ripper) changes - lexing, tokenizing
Description
First of all, I'd like to thank @yui-knk (Kaneko Yuichiro) for all the work on parse.y
. I assume some of it is due the movement of RDoc
from 'seattlerb' to 'ruby', along with RDoc
now using Ripper instead of its own parser.
I'm a YARD
user. Recent commits have broken some of YARD
's parsing code, although many of the commits actually fixed odd behavior in Ripper
. I did find one thing that seems odd.
It centers on whether Ripper.tokenize(src).join('') == src
or Ripper.tokenize(src).join('').length == src.length
should be true. I believe the actual issue for YARD is the following constraint:
src == Ripper.lex(src).each { |t| combined << t[2] }
Using the listed code, svn 60863 shows true for every source string, but 60878 shows false. The extra white-space content that appears in the :on_tstring_content
members with 60863 has been (understandably) removed in 60878, but it has not been accounted for in the :on_words_sep
(or :on_qwords_beg
) members.
# frozen_string_literal: true
require 'ripper'
require 'pp'
module RipperPercent
def self.run
output "%w(\n AA\n BB\n CC\n DD\n)"
output "%w(\n\nAA\n\nBB\n\nCC\n\nDD\n)"
output "%w(\n AA BB CC DD\n)"
end
def self.output(s)
combined = ''.dup
Ripper.lex(s).each { |t| combined << t[2] }
puts
puts "src #{s.gsub("\n", "\\n")}"
puts "lexed #{combined.gsub("\n", "\\n")}"
puts "src == lexed is #{s == combined}"
# puts ; pp Ripper.lex(s)
# puts Ripper.tokenize(s).inspect
# pp Ripper.sexp_raw(s)
end
end
RipperPercent.run
As mentioned previously, I'm not much of a c type, and much of Ripper
is not doc'd very well. Hence, I don't think I can fix this, if indeed it's an issue. I'm also aware of the complication that sometimes "\n" is equivalent to a space, and other times it's equivalent to ';'.
Finally, given all the changes that have occurred, when they seem stable/complete, might the version of Ripper be incremented?
Thanks, Greg
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r60883.
ripper.y: fix word list events
-
parse.y (parser_skip_words_sep): QWORDS_BEG should not include
the first separators in ripper. -
parse.y (parser_parse_string): WORDS_SEP should not include
the closing parentheses of a word list in ripper, should include
spaces at beginning of lines. [ruby-core:83864] [Bug #14126]
Updated by MSP-Greg (Greg L) about 7 years ago
Thank you for the patch, as the lex array looks as I would think it should (I'm not that familiar with parsers.)
Using 60884, Ripper.sexp_raw
and Ripper.sexp
now return nil for all three strings in the above code. They both 'worked' using 60863 and 60875.
I've also got an error in YARD's parsing of syntax error, unexpected tSTRING_CONTENT, expecting tSTRING_END
using the following input:
YARD's parser mostly hooks into Rippers events; I think the error is actually raised by Ripper. Not sure, as I've spent more time with YARD c parser than its ruby parser...
Thanks, Greg