Project

General

Profile

Actions

Bug #7501

closed

\w in a regular expression doesn't match international characters

Added by eltomito (Tomas Partl) over 11 years ago. Updated over 11 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux]
Backport:
[ruby-core:50516]

Description

When using regexp matching, \w doesn't match characters which are not in the English alphabet.
For example, the characters "žščřďťňaáéíóůúý" should all be matched by \w but aren't.

This program demonstrates the bug:


encoding: utf-8

match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" )
puts match.to_s

match = /\w+/.match( "áéíóůúýžščřďťň" ) #some Czech characters
puts match.to_s

match = /\w+/.match( "üäö" ) #some German characters
puts match.to_s

Expected output:

abcdefghijklmnopqrstuvwxyz
áéíóůúýžščřďťň
üäö

Actual output:

abcdefghijklmnopqrstuvwxyz


Actions

Also available in: Atom PDF

Like0
Like0Like0