Bug #9539

REXML XPath UTF8 encoding problem

Added by Mario Barcala about 1 year ago. Updated about 1 year ago.

[ruby-core:60901]
Status:Closed
Priority:Normal
Assignee:Kouhei Sutou
ruby -v:ruby 2.1.0 Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN, 2.1: UNKNOWN

Description

I found some problems in REXML when processing XPath expressions with Unicode not ascii characters. I attached a sample script and a sample document. If you see the script output, you will see two different problems:

1) text() XPath function does not work properly when there is an accent or tilde character.

2) two different XPath paths, one with an accent and the other without it, are considered the same.

Thank you,

Mario Barcala

sample.rb Magnifier - Sample script (366 Bytes) Mario Barcala, 02/20/2014 09:09 AM

sample.xml Magnifier - Sample document (224 Bytes) Mario Barcala, 02/20/2014 09:09 AM

Associated revisions

Revision 45153
Added by Kouhei Sutou about 1 year ago

  • lib/rexml/xmltokens.rb: Add missing non ASCII valid characters
    to element name characters. Now, REXML name tokens exactly
    match "[5] Name" in the XML spec and "[4] NCName" in the
    Namespaces in XML spec. See comment about the details.
    [Bug #9539]
    Reported by Mario Barcala. Thanks!!!

  • test/rexml/xpath/test_node.rb: Add tests for the above case.

Revision 45153
Added by Kouhei Sutou about 1 year ago

  • lib/rexml/xmltokens.rb: Add missing non ASCII valid characters
    to element name characters. Now, REXML name tokens exactly
    match "[5] Name" in the XML spec and "[4] NCName" in the
    Namespaces in XML spec. See comment about the details.
    [Bug #9539]
    Reported by Mario Barcala. Thanks!!!

  • test/rexml/xpath/test_node.rb: Add tests for the above case.

History

#1 Updated by Kouhei Sutou about 1 year ago

  • % Done changed from 0 to 100
  • Status changed from Open to Closed

Applied in changeset r45153.


  • lib/rexml/xmltokens.rb: Add missing non ASCII valid characters
    to element name characters. Now, REXML name tokens exactly
    match "[5] Name" in the XML spec and "[4] NCName" in the
    Namespaces in XML spec. See comment about the details.
    [Bug #9539]
    Reported by Mario Barcala. Thanks!!!

  • test/rexml/xpath/test_node.rb: Add tests for the above case.

#2 Updated by Kouhei Sutou about 1 year ago

  • Assignee set to Kouhei Sutou

Thanks for your report!
I've fixed it in trunk.

It was very helpul that you attach a sample script and sample XML to reproduce the problem. :-)

Also available in: Atom PDF