Project

General

Profile

Actions

Bug #19558

open

str.dump.undump crashes when str contains both Unicode and ASCII control characters

Added by ikaronen-relex (Ilmari Karonen) almost 1 year ago. Updated 12 months ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.3.0dev (2023-03-29T10:20:29Z master 02ecdf85c5) [x86_64-darwin21]
[ruby-core:113041]

Description

Recently, as a result of a question I asked on Stack Overflow (https://stackoverflow.com/q/75866159), I learned about the existence of String#dump and String#undump. However, I also found what seems like a bug in them, in that apparently dumping and then undumping a string containing a sufficiently diverse selection of characters (such as at least one ASCII C0 control character and at least one non-ASCII Unicode character) causes the undump to raise a RuntimeError.

Specifically, evaluating e.g. any of the following expressions:

"\u0000\uFFFF".dump.undump
"\u0001\uABCD".dump.undump
"\u007F\u0080".dump.undump

raises a RuntimeError with the message "hex escape and Unicode escape are mixed". This contradicts the documentation of String#undump, which says that it "does the inverse of String#dump."

The behavior is the same on all Ruby versions I have tested this on, including master (3.3.0), 2.6.10 and JRuby 9.3.10.0.

The obvious fix would be to simply remove the check for mixed hex and Unicode escape sequences, essentially reverting https://github.com/ruby/ruby/commit/05d1d29d1f4a87620371463d8c7942e170be031f. However, as I don't understand why the check is there in the first place, I'm also not sure if removing it could somehow have some unwanted consequences.

Actions

Also available in: Atom PDF

Like0
Like0