Actions
Bug #20039
closedMatching US-ASCII string to copied UTF-8 Regexp causes invalid multibyte character error
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.0dev (2023-12-03 master 85bc80a)
Description
Matching a US-ASCII string to a UTF-8 encoded regexp with multibyte characters works as expected.
re = Regexp.new("\u2018".encode("UTF-8"))
"".encode("US-ASCII").match?(re)
=> false
However, if that regexp is used to initialize a new regexp, the comparison fails with a Invalid mutibyte character error.
re = Regexp.new("\u2018".encode("UTF-8"))
"".encode("US-ASCII").match?(Regexp.new(re))
=> ArgumentError: regexp preprocess failed: invalid multibyte character
After a bunch of digging, I discovered that this error was due to the fixed encoding flag not being copied over from the original regexp. This pull request address the issue by copying the fixed encoding and no encoding flags during reg_copy
.
Actions
Like0
Like0Like0Like0