Bug #9061

REXML::Parsers::UltraLightParser で doctype を含む XML のパースがうまくいかない

Added by Ippei Obayashi almost 2 years ago. Updated over 1 year ago.

[ruby-dev:47778]
Status:Closed
Priority:Normal
Assignee:Kouhei Sutou
ruby -v:ruby 2.1.0dev (2013-10-29 trunk 43466) [x86_64-linux] Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

以下のコード (test_ulp.rb)
require 'rexml/parsers/ultralightparser'
require 'pp'

pp REXML::Parsers::UltraLightParser.new(<
<!DOCTYPE root SYSTEM "foo" [
<!ENTITY f "bar">
<!ENTITY g "baz">
]>

XML
を動かすと、期待される出力は
[[:xmldecl, "1.0", "UTF-8", nil],
[:text, "\n"],
[:doctype,
[...],
"root",
"SYSTEM",
"foo",
nil,
[:entitydecl, "f", "bar"],
[:entitydecl, "g", "baz"]],
[:text, "\n"],
[:start_element, [...], "root", {}],
[:text, "\n"]]
のようなものですが、実際には
[[:xmldecl, "1.0", "UTF-8", nil],
[:text,
"\n",
[:text, "\n"],
[:start_element, [...], "root", {}],
[:text, "\n"]],
[:start_doctype, "root", "SYSTEM", "foo", nil],
[:entitydecl, "f", "bar"],
[:entitydecl, "g", "baz"]]
のようなものが出力されます。この出力は木構造上の :start_element
や :start_doctype の位置が期待される場所と異なります。
この挙動は ruby 2.1.0dev (2013-10-29 trunk 43466) [x86_64-linux] および
ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux] で確認しました。

添付したパッチ(rexml-ultralightparser.patch)のように変更するとうまく動くと思います。

test_ulp.rb Magnifier (232 Bytes) Ippei Obayashi, 10/30/2013 11:33 PM

rexml-ultralightparser.patch Magnifier (505 Bytes) Ippei Obayashi, 10/30/2013 11:33 PM

Associated revisions

Revision 43693
Added by Kouhei Sutou over 1 year ago

  • lib/rexml/parsers/ultralightparser.rb
    (REXML::Parsers::UltraLightParser#parse): Fix wrong :start_doctype
    position.
    [Bug #9061]
    Patch by Ippei Obayashi. Thanks!!!

  • test/rexml/parser/test_ultra_light.rb: Add a test for this case.

Revision 43693
Added by Kouhei Sutou over 1 year ago

  • lib/rexml/parsers/ultralightparser.rb
    (REXML::Parsers::UltraLightParser#parse): Fix wrong :start_doctype
    position.
    [Bug #9061]
    Patch by Ippei Obayashi. Thanks!!!

  • test/rexml/parser/test_ultra_light.rb: Add a test for this case.

History

#1 Updated by Hiroshi SHIBATA almost 2 years ago

  • Assignee set to Kouhei Sutou
  • Target version set to 2.1.0

#2 Updated by Kouhei Sutou over 1 year ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r43693.
Ippei, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • lib/rexml/parsers/ultralightparser.rb
    (REXML::Parsers::UltraLightParser#parse): Fix wrong :start_doctype
    position.
    [Bug #9061]
    Patch by Ippei Obayashi. Thanks!!!

  • test/rexml/parser/test_ultra_light.rb: Add a test for this case.

#3 Updated by Kouhei Sutou over 1 year ago

遅くなりましたがテストを追加してパッチをそのまま取り込みました!
報告ありがとうございました!

Also available in: Atom PDF