Bug #5278

REXML -- Malformed comment

Added by Thomas Fritzsche over 2 years ago. Updated about 1 year ago.

[ruby-core:39289]
Status:Closed
Priority:Normal
Assignee:Kouhei Sutou
Category:-
Target version:1.9.3
ruby -v:ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.1.0] Backport:

Description

Hi Ruby-Team,

I use lib rexml for XML parsing. Kanjidic2 XML-File: http://www.csse.monash.edu.au/~jwb/kanjidic2/  (I do not attach file because it it too large)
It works with version 1.8.7 but PaseException ("Malformed comment" is raised in lib/rexml/parsers/baseparser.rb

My Code looks like this:

require 'rexml/document'
require 'rexml/streamlistener'
class KanjiListener
include REXML::StreamListener
end

f = File.new("kanji.xml","rb")
list = KanjiListener.new

REXML::Document.parse_stream(f, list)

The used XML-File from above link has a comment section that looks like:

...
<!-- Version 1.6 - April 2008
This is the DTD of the XML-format kanji file combining information from
the KANJIDIC and KANJD212 files. It is intended to be largely self-
documenting, with each field being accompanied by an explanatory
comment.
-->
...

It's strange but the parser fails at "self- documented".

The issue comes up here (about line 345):
...
if md[0][2] == ?-
md = @source.match( COMMENT_PATTERN, true )

            case md[1]
            when /--/, /-$/
              raise REXML::ParseException.new("Malformed comment", @source)
            end

...

The MatchingData md[1] contains the complete comment and than regular expression /-$/ matches.
From Debugging I guess the original Buffer is read by "readline" and somehow still includes the end-of-line markers.

I tried to open the original FileIO with different newline-parameters but nothing helped. I tried different ruby versions (incl. todays 1.9.3-head) but complete 1.9 seems to have the problem while 1.8 works.
I meanwhile converted to nokogiri XML-Parser and this works without problem on 1.9.x and I would expect that REXML could parse this too. For test purpose I just changed a single character on this file so that "/-$/" does not match "self-" in original XML file and than it works.

どうぞよろしくお願いします。

History

#1 Updated by Yui NARUSE over 2 years ago

  • Status changed from Open to Assigned
  • Assignee set to Kouhei Sutou
  • Target version set to 1.9.3

#2 Updated by Kouhei Sutou over 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

Thanks for your report!
I've fixed it in r33210.

#3 Updated by Kouhei Sutou about 1 year ago

須藤です。

私がお願いしていたこのバックポートなんですが、
https://bugs.ruby-lang.org/issues/7764

ChangeLogの変更だけがバックポートされていて、実際の変更
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/33210
はバックポートされていないようにみえます。
(lib/rexml/parsers/baseparser.rbとかが変更されている。)

確認してもらえないでしょうか?

In 20130206051927.E67E568693@sakura.atdot.net
" usa:r39093 (ruby19_3): merge revision(s) 33210,33212: [Backport #5278]" on Wed, 6 Feb 2013 14:19:27 +0900 (JST),
usa ko1@atdot.net wrote:

usa 2013-02-06 14:19:18 +0900 (Wed, 06 Feb 2013)

New Revision: 39093

http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=rev&revision=39093

Log:
merge revision(s) 33210,33212: [Backport #5278]

* lib/rexml/parsers/baseparser.rb, test/rexml/test_comment.rb:
  allow a single hyphen in comment. [Bug 5278]
  Reported by Thomas Fritzsche. Thanks!!!
  allow a single hyphen in comment. [Bug #5278] 

Modified directories:
branches/ruby193/
Modified files:
branches/ruby
193/ChangeLog
branches/ruby19_3/version.h

Index: ruby19_3/ChangeLog

--- ruby193/ChangeLog (revision 39092)
+++ ruby
193/ChangeLog (revision 39093)
@@ -1,3 +1,9 @@ https://github.com/ruby/ruby/blob/trunk/ruby_1_9_3/ChangeLog#L1
+Wed Feb 6 14:19:07 2013 Kouhei Sutou kou@cozmixng.org
+
+ * lib/rexml/parsers/baseparser.rb, test/rexml/test_comment.rb:
+ allow a single hyphen in comment. [Bug #5278]
+ Reported by Thomas Fritzsche. Thanks!!!
+
Wed Feb 6 14:14:38 2013 Nobuyoshi Nakada nobu@ruby-lang.org

 * file.c (realpath_rec): prevent link from GC while link_names refers

Index: ruby19_3/version.h

--- ruby193/version.h (revision 39092)
+++ ruby
193/version.h (revision 39093)
@@ -1,5 +1,5 @@ https://github.com/ruby/ruby/blob/trunk/ruby_1_9_3/version.h#L1
#define RUBYVERSION "1.9.3"
-#define RUBY
PATCHLEVEL 378
+#define RUBY_PATCHLEVEL 379

#define RUBYRELEASEDATE "2013-02-06"
#define RUBYRELEASEYEAR 2013

Property changes on: ruby19_3


Modified: svn:mergeinfo
Merged /trunk:r33210,33212

ML: ruby-changes@quickml.atdot.net
Info: http://www.atdot.net/~ko1/quickml/

#4 Updated by Usaku NAKAMURA about 1 year ago

こんにちは、なかむら(う)です。

In message " Re: usa:r39093 (ruby19_3): merge revision(s) 33210,33212: [Backport #5278]"
on Feb.06,2013 20:46:00, kou@cozmixng.org wrote:

須藤です。

私がお願いしていたこのバックポートなんですが、
https://bugs.ruby-lang.org/issues/7764

ChangeLogの変更だけがバックポートされていて、実際の変更
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/33210
はバックポートされていないようにみえます。
(lib/rexml/parsers/baseparser.rbとかが変更されている。)

確認してもらえないでしょうか?

ぐえっ

それでは。
--
U.Nakamura usa@garbagecollect.jp

#5 Updated by Kouhei Sutou about 1 year ago

須藤です。

In 20130206132214.95E816EA62@zanzibar.garbagecollect.jp
" Re: usa:r39093 (ruby19_3): merge revision(s) 33210,33212: [Backport #5278]" on Wed, 6 Feb 2013 22:22:14 +0900,
"U.Nakamura" usa@garbagecollect.jp wrote:

私がお願いしていたこのバックポートなんですが、
https://bugs.ruby-lang.org/issues/7764

ChangeLogの変更だけがバックポートされていて、実際の変更
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/revisions/33210
はバックポートされていないようにみえます。
(lib/rexml/parsers/baseparser.rbとかが変更されている。)

確認してもらえないでしょうか?

ぐえっ

なぜか全くわかりませんが、svn mergeで一切エラーは出ないのにこ
れらの変更がスルーされるという怪奇現象が起きていました。

さっそくの対応ありがとうございました!

Also available in: Atom PDF