Project

General

Profile

Actions

Bug #19848

closed

Ripper BOM behavior

Bug #19848: Ripper BOM behavior

Added by kddnewton (Kevin Newton) about 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:114495]

Description

When there is a byte-order mark in a file, the first token in the file usually begins at -3. For example:

Ripper.lex("\xEF\xBB\xBF[]")
# => [[[1, -3], :on_lbracket, "[", BEG|LABEL], [[1, 1], :on_rbracket, "]", END]]

The rest of the tokens appear as if the byte-order-mark never existed. This is consistent except for the case where the file starts with a global variable, an instance variable, or a class variable. In those cases the first token begins at 0. For example:

Ripper.lex("\xEF\xBB\xBF@foo")
# => [[[1, 0], :on_ivar, "@foo", END]]

Ripper.lex("\xEF\xBB\xBF@@foo")
# => [[[1, 0], :on_cvar, "@@foo", END]]

Ripper.lex("\xEF\xBB\xBF$foo")
# => [[[1, 0], :on_gvar, "$foo", END]]

Additionally, when there is a byte-order mark it usually does not appear as part of the first token, unless the token is a magic encoding comment. If it's a magic encoding comment, then it's part of the value:

Ripper.lex("\xEF\xBB\xBF# encoding: us-ascii")
# => [[[1, -3], :on_comment, "\xEF\xBB\xBF# encoding: us-ascii", BEG]]

For solutions - when there is a byte-order mark I think the column information should either always start at 0, or always start at -3. Then for the encoding comment, it should probably not show up as part of the value, or it should show up for all comments.

Updated by kddnewton (Kevin Newton) about 2 years ago Actions #1 [ruby-core:114496]

Apologies, I think I was wrong about the last part, it's part of the string but it doesn't show up on inspect. So this is just about the column information then.

Updated by nobu (Nobuyoshi Nakada) about 2 years ago Actions #3

  • Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED

Updated by nobu (Nobuyoshi Nakada) about 2 years ago Actions #4

  • Status changed from Open to Closed

Applied in changeset git|1f76e42b85be4031bdedcc3e457e8fa949195304.


[Bug #19848] Flush BOM

The token just after BOM needs to position at column 0, so that the
indentation matches closing line.

Actions

Also available in: PDF Atom