Project

General

Profile

Actions

Feature #21975

open

Add "UTF-八" as an alias for UTF-8 encoding

Feature #21975: Add "UTF-八" as an alias for UTF-8 encoding
1

Added by ko1 (Koichi Sasada) about 17 hours ago. Updated about 3 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125164]

Description

In Japan, legal texts must write all characters - including digits - using full-width or kanji forms. As a result, the encoding name "UTF-8"
appears as "UTF-八" (八 = eight in kanji) in official government notices.

Specifically, it appears in a notice issued by the Digital Agency and the Ministry of Internal Affairs and Communications (令和8年デジタル庁・総務省告示第12号), which defines character sets and encoding for local government information systems:

地方公共団体情報システムの標準化に関する法律第七条第一項に規定する各地方公共団体情報システムに共通する基準のうち電磁的記録において用いられる用語及び符号の相互運用性の確保その他の地方公共団体情報システムに係る互換性の確保に関する標準を定める命令第三条第二号の規定に基づき行政事務標準文字の文字セット及び地方公共団体情報システム間の連携のための文字符号化方式を定める告示

Reference: https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/d12bde7e-a950-493b-987c-0f8d4bbd1b6b/66117898/20260324_laws_notice_text_02.pdf

This patch https://github.com/ruby/ruby/pull/16623 adds "UTF-八" as an encoding alias for UTF-8, so that Ruby is compliant with Japanese law.

# encoding: UTF-八

p __ENCODING__ #=> #<Encoding:UTF-8>

p Encoding.find("UTF-八")              #=> #<Encoding:UTF-8>
p "hello".encode("UTF-八")             #=> "hello"
p "こんにちは".force_encoding("UTF-八") #=> "こんにちは"

p "こんにちは".encode("U
                      T
                      F
                      |
                      八") #=> "こんにちは"

Updated by jinroq (Jinroq SAITOH) about 15 hours ago Actions #1 [ruby-core:125168]

ko1 (Koichi Sasada) wrote:

In Japan, legal texts must write all characters - including digits - using full-width or kanji forms. As a result, the encoding name "UTF-8"
appears as "UTF-八" (八 = eight in kanji) in official government notices.

Specifically, it appears in a notice issued by the Digital Agency and the Ministry of Internal Affairs and Communications (令和8年デジタル庁・総務省告示第12号), which defines character sets and encoding for local government information systems:

地方公共団体情報システムの標準化に関する法律第七条第一項に規定する各地方公共団体情報システムに共通する基準のうち電磁的記録において用いられる用語及び符号の相互運用性の確保その他の地方公共団体情報システムに係る互換性の確保に関する標準を定める命令第三条第二号の規定に基づき行政事務標準文字の文字セット及び地方公共団体情報システム間の連携のための文字符号化方式を定める告示

Reference: https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/d12bde7e-a950-493b-987c-0f8d4bbd1b6b/66117898/20260324_laws_notice_text_02.pdf

This patch https://github.com/ruby/ruby/pull/16623 adds "UTF-八" as an encoding alias for UTF-8, so that Ruby is compliant with Japanese law.

# encoding: UTF-八

p __ENCODING__ #=> #<Encoding:UTF-8>

p Encoding.find("UTF-八")              #=> #<Encoding:UTF-8>
p "hello".encode("UTF-八")             #=> "hello"
p "こんにちは".force_encoding("UTF-八") #=> "こんにちは"

p "こんにちは".encode("U
                      T
                      F
                      |
                      八") #=> "こんにちは"

Should "UTF-" also be full-width ("UTF−")?

Updated by duerst (Martin Dürst) about 10 hours ago Actions #2 [ruby-core:125173]

Thanks to @ko1 (Koichi Sasada) for this timely news. It looks like the current Japanese government is recently taking some steps that in some ways have felt long overdue. On December 22, 2025, they changed the Romanization used by the Government from 'Kunrei' to 'Hepburn' (see e.g. https://en.wikipedia.org/wiki/Hepburn_romanization). Kunrei reflects the structure of the Japanese syllabaries (Hiragana, Katakana), but Hepburn makes it easier for foreigners to pronounce Japanese words more or less correctly.

Anyway, with respect to @ko1's proposal, I think it's a good idea to allow "UTF-八" (and probably also full-width "UTF-八") as an alternative to "UTF-8" for internal Ruby use. However, it shouldn't be used on the Internet unless it is formally registered (see https://www.iana.org/assignments/character-sets/character-sets.xhtml).

As the expert reviewer for that registry (rather than as a Rubyist) I would have to reject such a registration because currently, "charset"s have to be US-ASCII. Rewriting the relevant RFCs (not to speak about all the software that uses them) would be a lot of work :-).

Updated by Dan0042 (Daniel DeLorme) about 3 hours ago Actions #3 [ruby-core:125175]

duerst (Martin Dürst) wrote in #note-2:

I think it's a good idea to allow "UTF-八" (and probably also full-width "UTF-八") as an alternative to "UTF-8" for internal Ruby use.

Indeed, but I believe "UTF-八" would be a better alias here, since hyphen does indeed have a fullwidth version (U+FF0D) distinct from the prolonged sound mark ー (U+30FC)

Actions

Also available in: PDF Atom