Bug #15908: Detecting BOM with non-UTF encoding - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #15908

closed

Detecting BOM with non-UTF encoding

Bug #15908: Detecting BOM with non-UTF encoding

Added by nobu (Nobuyoshi Nakada) about 7 years ago. Updated almost 7 years ago.

Status:

Closed

Assignee:

Target version:

ruby -v:

Backport:

2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN

[ruby-core:93024]

Description

Currently, "bom|" encoding prefix to File.open is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

Files

0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB) 0001-Enable-BOM-detection-with-non-UTF-encodings.patch

nobu (Nobuyoshi Nakada), 06/08/2019 12:43 PM

Related issues 1 (0 open — 1 closed)

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#1

Related to Bug #15210: UTF-8 BOM should be removed from String in internal representation added

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#2 [ruby-core:94652]

Status changed from Open to Closed

Depending on usage, distinction of UTF-8 (with/without BOM), UTF-16LE without BOM, UTF-16BE with or without BOM, and so on may also be necessary. Also, for Japanese, traditionally distinction between EUC-JP, Shift_JIS, and ISO-2022-JP can additionally be necessary.

For more complex cases, heuristics are needed. On the other hand, applications may not want to (or not be allowed to, as e.g. for the bootstrap phase of an XML parser) allow more than a well defined subset.

This kind of processing is therefore better left to applications.

I'm closing this issue to not leave it dangling, but please feel free to reopen if you disagree.

Updated by naruse (Yui NARUSE) almost 7 years ago Actions
Copy link
#3 [ruby-core:94653]

I understand there's theoretically exist a situation this feature is useful.
But I think it doesn't exist in practice.
I object to provide an additional utility to support legacy encoding.

Updated by nobu (Nobuyoshi Nakada) almost 7 years ago Actions
Copy link
#4 [ruby-core:94675]

I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#5 [ruby-core:94680]

nobu (Nobuyoshi Nakada) wrote:

I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:

They are not yet extinct, unfortunately :-(. In Japan, there may be quite a few cases where this would work, but even in Japan, there are many other cases where a larger and/or different selection of encodings is needed.

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #15908

Detecting BOM with non-UTF encoding

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#2 [ruby-core:94652]

Updated by naruse (Yui NARUSE) almost 7 years ago Actions
Copy link
#3 [ruby-core:94653]

Updated by nobu (Nobuyoshi Nakada) almost 7 years ago Actions
Copy link
#4 [ruby-core:94675]

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#5 [ruby-core:94680]

Project

General

Profile

Ruby

Custom queries

Bug #15908

Detecting BOM with non-UTF encoding

Updated by nobu (Nobuyoshi Nakada) about 7 years ago ActionsCopy link #1

Updated by duerst (Martin Dürst) almost 7 years ago ActionsCopy link #2 [ruby-core:94652]

Updated by naruse (Yui NARUSE) almost 7 years ago ActionsCopy link #3 [ruby-core:94653]

Updated by nobu (Nobuyoshi Nakada) almost 7 years ago ActionsCopy link #4 [ruby-core:94675]

Updated by duerst (Martin Dürst) almost 7 years ago ActionsCopy link #5 [ruby-core:94680]

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#2 [ruby-core:94652]

Updated by naruse (Yui NARUSE) almost 7 years ago Actions
Copy link
#3 [ruby-core:94653]

Updated by nobu (Nobuyoshi Nakada) almost 7 years ago Actions
Copy link
#4 [ruby-core:94675]

Updated by duerst (Martin Dürst) almost 7 years ago Actions
Copy link
#5 [ruby-core:94680]