Project

General

Profile

Actions

Feature #10084

closed

Add Unicode String Normalization to String class

Added by duerst (Martin Dürst) about 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Target version:
[ruby-core:63955]

Description

Unicode string normalization is a frequent operation when comparing or normalizing strings.

This should be available directly on the String class.

The proposed syntax is:

'string'.normalize # normalize 'string' according to NFC (most frequent on the Web)
'string'.normalize :nfc # normalize 'string' according to NFC; :nfd, :nfkc, :nfkd also usable
'string'.nfc # shorter variant, but maybe too many methods

There are several "unofficial" but convenient normalization variants that could be offered, e.g.:

'string'.normalize :mac # use MacIntosh file system normalization variant

Implementations are already available in pure Ruby (easy for other Ruby implementations; e.g. eprun: https://github.com/duerst/eprun) and in C (unf,…, http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)


Files

Normalization.pdf (576 KB) Normalization.pdf Slide for developpers' meeting (2014/07/26) duerst (Martin Dürst), 07/23/2014 10:09 AM

Related issues

Related to Ruby master - Feature #2034: Consider the ICU Library for Improving and Expanding Unicode SupportRejectednaruse (Yui NARUSE)Actions
Related to Ruby master - Bug #7267: Dir.glob on Mac OS X returns unexpected string encodings for unicode file namesClosedduerst (Martin Dürst)11/02/2012Actions
Related to Ruby master - Feature #9111: Encoding-free String comparisonOpen11/14/2013Actions
Actions

Also available in: Atom PDF