Project

General

Profile

Backport #8210

Multibyte character interfering with end-line character within a regex

Added by sawa (Tsuyoshi Sawada) about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
[ruby-core:53944]

Description

=begin
With this regex:

regex1 = /\z/

the following strings match as expected:

"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5

but with these regexes:

regex2 = /#$/?\z/
regex3 = /\n?\z/

they show difference:

"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). I expect them to behave the same, and believe this is a bug.
=end


Files

fix-8210-1.diff (742 Bytes) fix-8210-1.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-2.diff (491 Bytes) fix-8210-2.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-1-update.diff (834 Bytes) fix-8210-1-update.diff k_takata (Ken Takata), 04/13/2013 07:31 PM

Also available in: Atom PDF