Feature #2034

Consider the ICU Library for Improving and Expanding Unicode Support

Added by runpaint (Run Paint Run Run) about 11 years ago. Updated about 3 years ago.

Target version:


Has consideration been recently given to employing the ICU library ( in Ruby? The bindings are in C and the library mature. My ignorance of the Ruby source not withstanding, this would allow existing String methods, among others, to support non-ASCII characters in an incremental manner.

For a trivial example, consider String#to_i. It currently understands only ASCII characters which represent digits. ICU provides a u_charDigitValue(code_point) function which returns the integer corresponding to the given Unicode codepoint. Were String#to_i to use this, it would work with non-ASCII counting systems, thus removing at least one of the "as long as it's ASCII" caveats currently associated with String methods.

More generally, if it's desirable for String methods to properly support Unicode, and if the principle barrier is the difficulty of the implementation, then might there be at least a partial solution in marrying Ruby with ICU?

If ICU is unfeasible, I'd appreciate understanding why. There are multiple approaches to what I term the second phase of Unicode support in Ruby, and it will be easier to choose between them if I understand the constraints. :-) (Of course, if a direction has already been determined, and work on it is underway, I will gladly bow out ;-)).

Related issues

Related to CommonRuby - Feature #10084: Add Unicode String Normalization to String classClosedduerst (Martin Dürst)07/23/2014Actions
Related to CommonRuby - Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalizeClosedduerst (Martin Dürst)Actions

Also available in: Atom PDF