Project

General

Profile

Actions

Bug #9096

closed

Regexp.quote(UTF-8) returns US-ASCII

Added by walles (Johan Walles) over 10 years ago. Updated over 4 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.0.0p247 (2013-06-27 revision 41674) [universal.x86_64-darwin13]
Backport:
[ruby-core:58229]

Description

The attached program contains a unit test demonstrating that doing Regexp.quote() on an UTF-8 encoded string returns a US-ASCII encoded string (or at least I think it does...).

I would expect Regexp.quote() to return a string with the same encoding as the input string.


Files

regexp-quote-encoding.rb (375 Bytes) regexp-quote-encoding.rb Repro walles (Johan Walles), 11/08/2013 05:01 AM

Updated by duerst (Martin Dürst) over 10 years ago

  • Status changed from Open to Feedback

The encoding is set back to US-ASCII because the string is just 'foo'. If you change the string e.g. to "foo\u1234", then even after using Regexp.quote, it will keep UTF-8 as the encoding.

A US-ASCII Regexp will match against any UTF-8 String the same way the corresponding UTF-8 Regexp will match (US-ASCII is treated as a common denominator in Ruby), so I don't think there should be any problems.

In case you find any actual problems, please report back.

Actions #2

Updated by jeremyevans0 (Jeremy Evans) over 4 years ago

  • Status changed from Feedback to Closed
  • Backport deleted (1.9.3: UNKNOWN, 2.0.0: UNKNOWN)
Actions

Also available in: Atom PDF

Like0
Like0Like0