Project

General

Profile

Bug #5871

regexp \W matches some word characters when inside a case-insensitive character class

Added by garethadams (Gareth Adams) over 8 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]
Backport:
[ruby-core:42003]

Description

=begin
The following replacement, which should do nothing, has removed the upper- and lower-case "K"s and "S"s from the result:

> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/i,"")
=> "ABCDEFGHIJLMNOPQRTUVWXYZabcdefghijlmnopqrtuvwxyz"

The result is correct (the same as the input string) if I remove either the character class:

> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/\W/i,"")
=> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 

or the case insensitive flag:

> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".gsub(/[\W]/,"")
=> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

This has been observed in two separate ruby 1.9 installs:

  • ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]
  • ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]

but works correctly in 1.8
=end


Related issues

Is duplicate of Ruby master - Bug #4044: Regex matching errors when using \W character class and /i optionClosednaruse (Yui NARUSE)Actions

Also available in: Atom PDF