Bug #3217

Regexp fails to match string with '<' when encoding is UTF-8

Added by brixen (Brian Ford) about 2 years ago. Updated about 1 year ago.

[ruby-core:29864]
Status:Rejected Start date:04/29/2010
Priority:Normal Due date:
Assignee:naruse (Yui NARUSE) % Done:

0%

Category:M17N
Target version:1.9.2
ruby -v:ruby 1.9.2dev (2010-04-28 trunk 27536) [i386-darwin9.8.0]

Description

Hi,

There is an issue matching a string like "a *b* c *d*<" when the encoding of the file is set to UTF-8 and the regexp is attempting to match '*something*'. Afaik, *< is not special in the encoding.

This gist illustrates the issue:

http://gist.github.com/382510

Thanks,
Brian

Related issues

duplicated by Archive91 - Bug #3386: Inconsistent regexp punct class matching behavior between... Rejected 06/04/2010

History

Updated by naruse (Yui NARUSE) about 2 years ago

  • Status changed from Open to Rejected
'<' is not Punctuation on Unicode; it is Math_Symbol.
http://unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt

Updated by naruse (Yui NARUSE) about 2 years ago

  • Status changed from Rejected to Assigned
  • Assignee set to naruse (Yui NARUSE)
Oops, I missed this. I'll fix.

Updated by naruse (Yui NARUSE) about 2 years ago

  • Category set to M17N
  • Status changed from Assigned to Rejected
This is feature change on Ruby 1.9.
http://www.unicode.org/reports/tr18/

And redcloth3's exapmle is a bug, they should use their PUNCT constant.

Also available in: Atom PDF