Bug #19848
closedRipper BOM behavior
Description
When there is a byte-order mark in a file, the first token in the file usually begins at -3. For example:
Ripper.lex("\xEF\xBB\xBF[]")
# => [[[1, -3], :on_lbracket, "[", BEG|LABEL], [[1, 1], :on_rbracket, "]", END]]
The rest of the tokens appear as if the byte-order-mark never existed. This is consistent except for the case where the file starts with a global variable, an instance variable, or a class variable. In those cases the first token begins at 0. For example:
Ripper.lex("\xEF\xBB\xBF@foo")
# => [[[1, 0], :on_ivar, "@foo", END]]
Ripper.lex("\xEF\xBB\xBF@@foo")
# => [[[1, 0], :on_cvar, "@@foo", END]]
Ripper.lex("\xEF\xBB\xBF$foo")
# => [[[1, 0], :on_gvar, "$foo", END]]
Additionally, when there is a byte-order mark it usually does not appear as part of the first token, unless the token is a magic encoding comment. If it's a magic encoding comment, then it's part of the value:
Ripper.lex("\xEF\xBB\xBF# encoding: us-ascii")
# => [[[1, -3], :on_comment, "\xEF\xBB\xBF# encoding: us-ascii", BEG]]
For solutions - when there is a byte-order mark I think the column information should either always start at 0, or always start at -3. Then for the encoding comment, it should probably not show up as part of the value, or it should show up for all comments.
Updated by kddnewton (Kevin Newton) over 1 year ago
Apologies, I think I was wrong about the last part, it's part of the string but it doesn't show up on inspect. So this is just about the column information then.
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: REQUIRED, 3.1: REQUIRED, 3.2: REQUIRED
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|1f76e42b85be4031bdedcc3e457e8fa949195304.
[Bug #19848] Flush BOM
The token just after BOM needs to position at column 0, so that the
indentation matches closing line.