https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17097754782013-04-12T00:11:37ZRuby Issue Tracking SystemRuby master - Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16https://bugs.ruby-lang.org/issues/8255?journal_id=384512013-04-12T00:11:37Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>This is because</p>
<ul>
<li>UTF-16 is dummy encoding; you must use UTF-16BE, UTF-16LE, or BOM|UTF-* specifier; OR some other treatment is needed on Ruby.</li>
<li>default line separator is ASCII \n, not UTF-16 \n. you must explicitly specify UTF-16(BE|LE) \n, or convert to some internal encoding; OR some other special treatment is needed on Ruby</li>
</ul> Ruby master - Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16https://bugs.ruby-lang.org/issues/8255?journal_id=384532013-04-12T01:13:46Zarton (Akio Tajima)artonx@yahoo.co.jp
<ul></ul><p>OK, I've fixed my test code. It had some bugs and change the 2nd arg of File#open to 'rb:UTF-16LE'.</p>
<p>Invoking String#rstrip is OK, but can't encode to another encoding from UTF-16LE.</p>
<p>First, I tried to encode utf-16le line to utf-8 using line.rstrip.encode('utf-8') but it failed.</p>
<p><"This is not a love song."> expected but was<br>
<"\uFFFE\u5400\u6800\u6900\u7300\u2000\u6900\u7300\u2000\u6E00\u6F00\u7400\u2000<br>
\u6100\u2000\u6C00\u6F00\u7600\u6500\u2000\u7300\u6F00\u6E00\u6700\u2E00\u0A00\u<br>
5400\u6800\u6900\u7300\u2000\u6900\u7300\u2000\u6E00\u6F00\u7400\u2000\u6100\u20<br>
00\u6C00\u6F00\u7600\u6500\u2000\u7300\u6F00\u6E00\u6700\u2E00\u0A00">.</p>
<p>Then I tried to encode the line to CP932 with the code " line.rstrip.encode('cp932') "<br>
The result was an exception.</p>
<p>Encoding::UndefinedConversionError: U+FFFE to Windows-31J in conversion from UTF-16LE to UTF-8 to Windows-31J.</p>
<p>Then I've tried to remove BOM from original line with code below:<br>
p line[0] #=> "\uFFFE"<br>
if line[0] == "\uFFFE" # => false, why ? (maybe BOM is nothing here character, but ...)<br>
line = line[1..-1]<br>
end</p>
<p>But nothing changes because the condition line[0] == "\uFFFE" was evaluated to false because if I put else clause, the clause run.</p>
<p>Is there any way to encode UTF-16LE to utf-8 or CP932 ?</p> Ruby master - Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16https://bugs.ruby-lang.org/issues/8255?journal_id=384542013-04-12T01:15:47Zarton (Akio Tajima)artonx@yahoo.co.jp
<ul><li><strong>File</strong> <a href="/attachments/3660">test_utf16.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/3660/test_utf16.rb">test_utf16.rb</a> added</li></ul><p>Attachment is the fixed version of test I'd expected the behaviour.</p> Ruby master - Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16https://bugs.ruby-lang.org/issues/8255?journal_id=384552013-04-12T01:21:39Zarton (Akio Tajima)artonx@yahoo.co.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Sorry, I've only changed 'rb:utf-16le' when I wrote above comments.<br>
It's running fine if I chaned 'wb:utf-16le' when writing out the file.</p> Ruby master - Bug #8255: File#each_line omits last byte (==\0) if encoding is utf-16https://bugs.ruby-lang.org/issues/8255?journal_id=384582013-04-12T01:52:57Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>Just FYI, you can propose transparent treatment along UTF-16 series ;-)</p>