Bug #14127
closed(CSV) generating UTF-16LE encoded file without BOM
Description
This file should contain BOM information so that it is properly detected as UTF-16LE file.
How to generate such file:
According to file -I file.csv this file is recognized as application/octet-stream; charset=binary because it is missing the BOM information.
According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE.
Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information.
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
laykou (Ladislav Gallay) wrote:
This file should contain BOM information so that it is properly detected as UTF-16LE file.
How to generate such file:
csv.rb seems having bugs in ASCII-incompatible encodings support.
According to
file -I file.csvthis file is recognized asapplication/octet-stream; charset=binarybecause it is missing the BOM information.According to Wikipedia https://en.wikipedia.org/wiki/UTF-16 it should contain "\xFF\xFE" on the beginning of the document so that everyone knows iths UTF-16LE.
CSV.generate just builds a CSV string, doesn't create a file.
Writing the result to a file with BOM is an application's responsibility.
CSV.open("utf16.csv", "w:UTF-16LE:utf-8") do |csv|
csv.to_io.write "\uFEFF"
csv << ['something', 'ľščťžýáíé']
end
Here is someone trying to fix this in the similiar way: https://stackoverflow.com/a/22950912/1632815 I did it: manually adding that BOM information.
Updated by hsbt (Hiroshi SHIBATA) over 8 years ago
- Status changed from Open to Assigned
- Assignee set to kou (Kouhei Sutou)
Updated by kou (Kouhei Sutou) over 8 years ago
- Status changed from Assigned to Rejected
Updated by printercu (Max Melentiev) over 7 years ago
WDYT about adding file_header option or something like this?
It's quite tricky to add this in streaming mode:
Updated by kou (Kouhei Sutou) over 7 years ago
Updated by printercu (Max Melentiev) over 7 years ago
It has different behaviour. In my example file is empty if csv.<< is never called, in suggested example it contains BOM anyway.