Bug #1449

[REXML] detected encoding isn't used correctly

Added by Kouhei Sutou almost 3 years ago. Updated 9 months ago.

[ruby-core:23404]
Status:Closed Start date:05/09/2009
Priority:Normal Due date:
Assignee:Sean Russell % Done:

100%

Category:lib
Target version:1.9.1
ruby -v:ruby 1.9.2dev (2009-05-09 trunk 23374) [x86_64-linux]

Description

REXML::Source can detect source encoding by XML declaration. REXML::IOSource can also detect it but it's not used correctly.

REXML::IOSource uses detected encoding to convert read data from @source. If detected encoding is UTF-8 read data isn't converted. (ref. rexml/encodings/UTF-8.rb) If detected encoding is UTF-8 but @source.external_encoding isn't UTF-8, it may cause a problem.

If @source.external_encoding is ASCII-8BIT and @source only has ASCII data, it doesn't cause any problems. If @source.external_encoding is ASCII-8BIT and @source has non-ASCII data, it causes a problem. In the case, "@buffer << read_data_from_source" causes an Encoding::CompatibilityError. It breaks correct XML parsing.

ruby19-rexml-encoding-mismatch.diff - a test case for the problem and a patch to fix the problem. (2.9 kB) Kouhei Sutou, 05/09/2009 01:38 pm

Associated revisions

Revision 27342
Added by Yui NARUSE almost 2 years ago

* lib/rexml/source.rb: force_encoding("UTF-8") when the input is already UTF-8. patched by Kouhei Sutou [ruby-core:23404]

History

Updated by Yuki Sonoda over 2 years ago

  • Assignee set to Sean Russell
  • Target version set to 1.9.1

Updated by Yui NARUSE almost 2 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
This issue was solved with changeset r27342.
Kouhei, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

Also available in: Atom PDF