Project

General

Profile

Actions

Feature #695

closed

More flexibility when combining ASCII-8BIT strings with other encodings

Added by mike (Michael Selig) over 15 years ago. Updated almost 13 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:19590]

Description

=begin
Consider the following 3 Ruby statements:

String#pack always returns ASCII-8BIT

s1 = [97, 98, 99, 1589].pack("U*")

\xNN returns the source encoding (even if it is an invalid string), or ASCII-8BIT if not set

s2 = "abc\xD8\xB5"

\uNNNN always returns UTF-8

s3 = "abc\u0635"

All of s1, s2, and s3 have the same contents, but different encodings. When you try to combine them, you get different "encoding compatibility" problems, which can change depending on the source encoding, due to the treatment of s2.

I would like to see Ruby be able to combine all the above without error. I don't think it is reasonable to have to use "force_encoding" in these cases. This would

  • give better compatibility with 1.8,
  • make handling of methods returning ASCII-8BIT strings much easier (eg String#pack and libraries which return strings in ASCII-8BIT because the encoding is unknown)
  • reduce the confusion caused with "\x" producing a string which depends on the source encoding (which I dislike - I think it should always return ASCII-8BIT).

So the feature request is:

When combining 2 strings, with one being ASCII-8BIT, and the other is encoding "E":

  1. If the ASCII-8BIT string is valid if forced to encoding E, then treat the ASCII-8BIT string as being in encoding E;
  2. Otherwise treat both strings as ASCII-8BIT.

Part (2) is less important, and can probably be omitted if it is hard to implement.

Thank you
Michael Selig
=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0