Project

General

Profile

Actions

Feature #1784

closed

More encoding (Big5 series) support?

Added by godfat (Lin Jen-Shin) over 15 years ago. Updated over 13 years ago.

Status:
Closed
Target version:
[ruby-core:24390]

Description

=begin
I was very glad to see there's build-in encoding support,
but if we could support more Big5 related encodings,
it would be much better, because there are many,
many Big5 extensions.

Current CP950 (from Microsoft) do not contain Japanese
nor Simplified Chinese, nor some Traditional Chinese characters.
Because of this, many Big5 extensions were invented.
The most popular Big5 extensions nowaday would be Big5-HKSCS and
UAO ( Unicode-at-on, http://uao.cpatch.org/ ).

libiconv supports Big5-HKSCS, but UAO not.
I am not sure about Big5 status in Honk Kong, but here in Taiwan,
the most used Big5 encoding was UAO. (I think)
For example, telnet://ptt.cc contains many, many Japanese
characters in UAO. It's a very popular BBS in Taiwan.

Here's a reference in Traditional Chinese from Mozilla Taiwan:
http://moztw.org/docs/big5/

There's `Mozilla 1.8' too, trying to merge some Big5
encodings into one, but I am not familiar with it.
At least I can use it to read most characters.

Here's the related issue from Mozilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=310299

And here's the table they used:
Big5 to Unicode(codepoint):
http://moztw.org/docs/big5/table/moz18-b2u.txt
Unicode(codepoint) to Big5:
http://moztw.org/docs/big5/table/moz18-u2b.txt

I am trying to build this into Ruby, but I am
no expert in encoding nor Ruby core development.
The first experiment succeeded and I'm trying
to polish it later.

Could Ruby support more encodings in the future?
Or is there a way to add more encodings from
user library level?

Many Thanks!
=end


Files

test_big5-hkscs.rb (887 Bytes) test_big5-hkscs.rb godfat (Lin Jen-Shin), 09/08/2009 11:25 PM

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #4073: HKSCS-2008Closed11/19/2010Actions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0