Backport #8650

Unexpected result of Regexp#to_s with utf-16 and utf-32 string.

Added by Heesob Park 9 months ago. Updated 9 months ago.

[ruby-core:56063]
Status:Closed
Priority:Normal
Assignee:-

Description

I found the result of Regexp#to_s is incorrect with utf-16 and utf-32 encoded string.

C:\Users\phasis>irb
irb(main):001:0> Regexp.new('abcd'.encode('UTF-16LE'))
=> /a b c d /
irb(main):002:0> Regexp.new('abcd'.encode('UTF-16LE')).tos
=> "\u3F28\u6D2D\u7869\u613A\u6200\u6300\u6400\u2900"
irb(main):003:0> Regexp.new('abcd'.encode('UTF-16BE'))
=> / a b c d/
irb(main):004:0> Regexp.new('abcd'.encode('UTF-16BE')).to
s
=> "\u283F\u2D6D\u6978\u3A00\u6100\u6200\u6300\u6429"
irb(main):005:0> Regexp.new('abcd'.encode('UTF-32LE'))
=> /a b c d /
irb(main):006:0> Regexp.new('abcd'.encode('UTF-32LE')).tos
=> "\u{6D2D3F28}\u{613A7869}\u{62000000}\u{63000000}\u{64000000}\u{29000000}"
irb(main):007:0> Regexp.new('abcd'.encode('UTF-32BE'))
=> / a b c d/
irb(main):008:0> Regexp.new('abcd'.encode('UTF-32BE')).to
s
=> "\u{283F2D6D}\u{69783A00}\u6100\u6200\u6300\u6429"

Same result for Ruby 1.9.3 and Ruby 2.0.0

History

#1 Updated by Yui NARUSE 9 months ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r42167.
Heesob, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • re.c (rbregto_s): convert closing parenthes to the target encoding if it is ASCII incompatible encoding. [Bug #8650]

#2 Updated by Tomoyuki Chikanaga 9 months ago

  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED

#3 Updated by Yui NARUSE 9 months ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200

Also available in: Atom PDF