Bug #20039: Matching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error - Ruby - Ruby Issue Tracking System

Bug #20039

Updated by dbrown9@gmail.com (Dustin Brown) over 1 year ago

Matching a US-ASCII string to a UTF-8 encoded regexp with multibyte characters works as expected. 

 ```ruby 
 re = Regexp.new("\u2018".encode("UTF-8")) 
 "".encode("US-ASCII").match?(re)  

 => false 
 ``` 

 However, if that regexp is used to initialize a new regexp, the comparison fails with a Invalid mutibyte character error. 

 ```ruby 
 re = Regexp.new("\u2018".encode("UTF-8")) 
 "".encode("US-ASCII").match?(Regexp.new(re)) 

 => ArgumentError: regexp preprocess failed: invalid multibyte character 
 ``` 

 After a bunch of digging, I discovered that this error was due to the fixed encoding flag not being copied over from the original regexp. This [pull request](https://github.com/ruby/ruby/pull/9120) address the issue by copying the fixed encoding and no encoding flags during `reg_copy`. 

 Ref: https://github.com/ruby/ruby/pull/9120

Back

Project

General

Profile

Ruby

Bug #20039