Feature #15995: Add encoding conversion for CESU-8 from and to UTF-8 - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #15995

closed

Add encoding conversion for CESU-8 from and to UTF-8

Feature #15995: Add encoding conversion for CESU-8 from and to UTF-8

Added by duerst (Martin Dürst) about 7 years ago. Updated about 7 years ago.

Status:

Closed

Assignee:

duerst (Martin Dürst)

Target version:

[ruby-core:93680]

Description

As discussed in issue #15931, encoding conversion (transcoding) from/to CESU-8 is missing, so we should add it. When then hopefully can make CESU-8 a dummy encoding.

Related issues 1 (1 open — 0 closed)

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#1

Related to Feature #15931: encoding for CESU-8 added

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#2 [ruby-core:93681]

Issue #15931 mentions both https://www.unicode.org/reports/tr26/tr26-4.html and https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/types.html#modified_utf_8_strings as definitions of CESU-8, but they are not identical.

The difference is in how they treat U+0000 (NULL) characters: UTR 26 does not treat it in any special way (i.e. it is encoded as "\x00"), but the Java definition treats specially, encoding it as "\xC0\x80". The IANA registration refers to the Unicode definition (see https://www.iana.org/assignments/charset-reg/CESU-8). TR 26 explains that "CESU-8 is useful in 8-bit processing environments where binary collation with UTF-16 is required.". For this to work, U+0000 has to be encoded as "\x00".

Issue #15931 currently implements CESU-8 as defined in UTR 26:

$ ruby -e 'puts "\xC0\x80".force_encoding("cesu-8").valid_encoding?'
false

$ ruby -e 'puts "\x00".force_encoding("cesu-8").valid_encoding?'
true

It is unclear whether this is what the originator of issue #15931 wanted; his use case seems to be Java.

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#3

Status changed from Open to Closed

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #15995

Add encoding conversion for CESU-8 from and to UTF-8

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#2 [ruby-core:93681]

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#3

Project

General

Profile

Ruby

Custom queries

Feature #15995

Add encoding conversion for CESU-8 from and to UTF-8

Updated by duerst (Martin Dürst) about 7 years ago ActionsCopy link #1

Updated by duerst (Martin Dürst) about 7 years ago ActionsCopy link #2 [ruby-core:93681]

Updated by duerst (Martin Dürst) about 7 years ago ActionsCopy link #3

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#2 [ruby-core:93681]

Updated by duerst (Martin Dürst) about 7 years ago Actions
Copy link
#3