Feature #11094: Remove traces of 6-byte UTF-8 - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #11094

closed

Remove traces of 6-byte UTF-8

Feature #11094: Remove traces of 6-byte UTF-8

Added by duerst (Martin Dürst) about 11 years ago. Updated about 11 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:68982]

Description

UTF-8 was originally defined with a codespace up to 31 bits, and therefore with up to 6 bytes per character. Since quite a few years ago, it has been reduced in all the relevant definitions (ISO, Unicode, IETF) to a codespace up to 0x10FFFF and a maximum of 4 bytes per character. Many places in the Ruby code base are updated to this 4 byte limit (e.g. EncLen_UTF8 in enc/utf_8.c). But other places in the Ruby code base are not yet updated to this limit (e.g. code_to_mbclen in enc/utf_8.c). This should be fixed.
[I have classified this as a feature because I wasn't able to find a way to expose this problem in Ruby code, but this should be reclassified as a bug if such a problem can be found.]

Files

0001-enc-utf_8.c-pack.c-limit-UTF-8.patch (6.68 KB) 0001-enc-utf_8.c-pack.c-limit-UTF-8.patch

nobu (Nobuyoshi Nakada), 04/25/2015 04:42 AM

Related issues 3 (0 open — 3 closed)

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#1 [ruby-core:68987]

File 0001-enc-utf_8.c-pack.c-limit-UTF-8.patch 0001-enc-utf_8.c-pack.c-limit-UTF-8.patch added

And pack("U") and unpack("U")?

Also rubyspec seems to fail.

Array#pack with format 'U' encodes values larger than UTF-8 max codepoints ERROR
RangeError: pack(U): value out of range

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#2

Status changed from Open to Closed
% Done changed from 0 to 100

Applied in changeset r50392.

enc/utf_8.c: limit UTF-8

enc/utf_8.c (code_to_mbclen, code_to_mbc): reject values larger
than UTF-8 max codepoints. [Feature #11094]

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#3

Related to Bug #13353: Backport stringio fixes added

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#4

Related to Bug #13590: Change max byte length of UTF-8 to 4 bytes to conform to definition of UTF-8 added

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#5

Related to Feature #13588: Add Encoding#min_char_size, #max_char_size, #minmax_char_size added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #11094

Remove traces of 6-byte UTF-8

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#1 [ruby-core:68987]

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#2

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#3

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#4

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#5

Related to Ruby - Bug #13353: Backport stringio fixes	Closed		Actions
Related to Ruby - Bug #13590: Change max byte length of UTF-8 to 4 bytes to conform to definition of UTF-8	Closed	duerst (Martin Dürst)	Actions
Related to Ruby - Feature #13588: Add Encoding#min_char_size, #max_char_size, #minmax_char_size	Feedback		Actions

Project

General

Profile

Ruby

Custom queries

Feature #11094

Remove traces of 6-byte UTF-8

Updated by nobu (Nobuyoshi Nakada) about 11 years ago ActionsCopy link #1 [ruby-core:68987]

Updated by nobu (Nobuyoshi Nakada) about 11 years ago ActionsCopy link #2

Updated by nobu (Nobuyoshi Nakada) over 9 years ago ActionsCopy link #3

Updated by duerst (Martin Dürst) about 9 years ago ActionsCopy link #4

Updated by duerst (Martin Dürst) about 9 years ago ActionsCopy link #5

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#1 [ruby-core:68987]

Updated by nobu (Nobuyoshi Nakada) about 11 years ago Actions
Copy link
#2

Updated by nobu (Nobuyoshi Nakada) over 9 years ago Actions
Copy link
#3

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#4

Updated by duerst (Martin Dürst) about 9 years ago Actions
Copy link
#5