Bug #6306

Ripper lexes :on_kw when it should find :on_ident

Added by Steve Loveless over 3 years ago. Updated about 3 years ago.

[ruby-core:44396]
Status:Rejected
Priority:Normal
Assignee:-
ruby -v:ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0] Backport:

Description

=begin
1.9.3-p125 :001 > require 'ripper'
=> true
1.9.3-p125 :002 > Ripper.lex %Q{:def}
=> [[[1, 0], :on_symbeg, ":"], [[1, 1], :on_kw, "def"]]

I'd expect (({[[1, 1], :on_kw, "def"]})) to be (({[[1, 1], :on_ident, "def]})). Sure, "def" is a keyword, but since it's prefaced by a ':', it's not necessarily being used in that context. The same behavior pertains to all other keywords that are used as Symbols.
=end


Related issues

Duplicated by Ruby trunk - Bug #8383: Ripper.lex does not handle correctly symbols whose identifiers coincide with keyword names Rejected 05/09/2013

History

#1 Updated by Yusuke Endoh over 3 years ago

  • Status changed from Open to Rejected

Hello,

2012/4/17, turboladen (Steve Loveless) steve.loveless@gmail.com:

I'd expect (({[[1, 1], :on_kw, "def"]})) to be (({[[1, 1], :on_ident,
"def]})). Sure, "def" is a keyword, but since it's prefaced by a ':', it's
not necessarily being used in that context. The same behavior pertains to
all other keywords that are used as Symbols.

Not a bug. That is exactly what is happenning in the Ruby lexer internal.

I don't understand what you really want to do, but anyway, you must handle the context manually, as on_kw is not only the exceptional case.

Ripper.lex(':@foo')[1][1] #=> :on_ivar
Ripper.lex(':$foo')[1][1] #=> :on_gvar
Ripper.lex(':@@foo')[1][1] #=> :on_cvar
Ripper.lex(':Foo')[1][1] #=> :on_const
Ripper.lex(':+')[1][1] #=> :on_op
Ripper.lex(':"foo"')[1][1] #=> :on_tstring_content
Ripper.lex(':"foo#{bar}baz"')

Yusuke Endoh mame@tsg.ne.jp

#2 Updated by Steve Loveless about 3 years ago

=begin
If this is how the lexer works, then I guess there's not much use in pleading my case, but FWIW, I'll explain more...

I'm the author of ((URL:http://rubygems.org/gems/tailor)) and I'm trying to parse code that looks, for example, like:

my_symbol = :def

I'm (conceptually) expecting to have Ripper tell me that (({:def})) is a symbol, plain and simple--not that ':' is the beginning of a symbol and the text used to describe the symbol is the same text that also happens to be a keyword. Using (({:def})) like above doesn't behave as a keyword, so from where I sit, it's odd to be told that it is a keyword. I'd expect similar behavior as:

Ripper.lex('def') #=> [[[1, 0], :on_tstring_beg, "'"], [[1, 1], :on_tstring_content, "def"], [[1, 4], :on_tstring_end, "'"]]

The tailor gem uses Ripper to, in this case, determine that good style means you should indent (in most cases) the next line after you start defining a method using 'def'. Since ':def' lexes as a symbeg + kw, but doesn't behave as a keyword, this means I have to use extra logic to figure out if it's being used as a symbol or an actual keyword--which is the main reason I opted to use Ripper in the first place: it tells me what context text like "def" and "class" and "self" are being used in. While I understand the reason for saying it's not a bug, as a Ripper ((user)) I find this behavior inconsistent.
=end

#3 Updated by Shyouhei Urabe about 3 years ago

So, you need to have semantics of your input, not a lexical analysis. Right?
Then your using Ripper.lex is a wrong idea. You should not only catch those lexer events but also catch parser events.
You might find Ripper.sexp() implemenation handy. https://github.com/ruby/ruby/blob/trunk/ext/ripper/lib/ripper/sexp.rb

#4 Updated by Steve Loveless about 3 years ago

That is correct. I've used Ripper.sexp() a tad, but will look to using that more. Thank you for the feedback.

Also available in: Atom PDF