Feature #1784
closedMore encoding (Big5 series) support?
Description
=begin
I was very glad to see there's build-in encoding support,
but if we could support more Big5 related encodings,
it would be much better, because there are many,
many Big5 extensions.
Current CP950 (from Microsoft) do not contain Japanese
nor Simplified Chinese, nor some Traditional Chinese characters.
Because of this, many Big5 extensions were invented.
The most popular Big5 extensions nowaday would be Big5-HKSCS and
UAO ( Unicode-at-on, http://uao.cpatch.org/ ).
libiconv supports Big5-HKSCS, but UAO not.
I am not sure about Big5 status in Honk Kong, but here in Taiwan,
the most used Big5 encoding was UAO. (I think)
For example, telnet://ptt.cc contains many, many Japanese
characters in UAO. It's a very popular BBS in Taiwan.
Here's a reference in Traditional Chinese from Mozilla Taiwan:
http://moztw.org/docs/big5/
There's `Mozilla 1.8' too, trying to merge some Big5
encodings into one, but I am not familiar with it.
At least I can use it to read most characters.
Here's the related issue from Mozilla:
https://bugzilla.mozilla.org/show_bug.cgi?id=310299
And here's the table they used:
Big5 to Unicode(codepoint):
http://moztw.org/docs/big5/table/moz18-b2u.txt
Unicode(codepoint) to Big5:
http://moztw.org/docs/big5/table/moz18-u2b.txt
I am trying to build this into Ruby, but I am
no expert in encoding nor Ruby core development.
The first experiment succeeded and I'm trying
to polish it later.
Could Ruby support more encodings in the future?
Or is there a way to add more encodings from
user library level?
Many Thanks!
=end
Files