Project

General

Profile

Actions

Bug #16402

closed

UTF-16LE BOM causing regex match to fail with "invalid byte sequence in UTF-8"

Added by PikachuEXE (Pikachu EXE) almost 5 years ago. Updated almost 5 years ago.

Status:
Third Party's Issue
Target version:
-
ruby -v:
ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
[ruby-core:96118]

Description

$ ruby -e 'File.binwrite("u.txt", "\xff\xfe\x00\x01")'
$ file u.txt 
u.txt: Little-endian UTF-16 Unicode text, with no line terminators
$ ruby -e 'p /\w+/.match?(File.read("u.txt"))'
Traceback (most recent call last):
	1: from -e:1:in `<main>'
-e:1:in `match?': invalid byte sequence in UTF-8 (ArgumentError)

No error should be raised, just like when comparing with string without BOM

$ ruby -e 'p /\w+/.match?(File.read("u.txt")[2..-1])'
false

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0