Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #10085

closed

Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize

Feature #10085: Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize

Added by duerst (Martin Dürst) almost 12 years ago. Updated over 4 years ago.

Status:

Closed

Assignee:

duerst (Martin Dürst)

Target version:

[ruby-core:63964]

Description

Case conversion functions are currently limited to ASCII characters. When used with formal languages, that may be appropriate, but it is often not appropriate for applications.

In order to avoid backwards-compatibility problems and to make sure that the various variants of case conversion (e.g. language-dependent) can be selected, we propose to add an optional parameter to the case conversion functions.

Our current design idea is as follows:

ASCII-only if no parameter:
'Türkiye'.upcase # => 'TüRKIYE', note lower-case ü

Parameter triggers (general) Unicode conversion:
'Türkiye'.upcase 'en' # => 'TÜRKIYE', note upper-case Ü

The parameter is actually a BCP 47 (http://tools.ietf.org/html/bcp47) language tag.
This means that for languages with special case conversion rules, such as Turkish, this works as follows:
'Türkiye'.upcase 'tr' # => 'TÜRKİYE', note upper-case İ (with dot!)

In the second example, we used 'en', but most other languages would work, too, because a single case conversion works for most languages. Turkic languages are the biggest exception.

The Unicode standard also defines various cases of "case-folding", which usually is lossy, e.g. mapping German ß to
ss and so on. It should be possible to include this functionality in this proposal, e.g. by using :symbols or CONSTANTs for the few specific foldings. It may also be possible to define a reversible variant of case conversion in particular for use with swapcase.

In the long term, instead of a direct BCP 47 string, we could create a Locale class that would incorporate language-specific facilities, but this may need more detailed considerations.

The idea of using an additional parameter to indicate language-dependent or other processing variants should be extensible to areas such as number-to-string conversion and date formation. While this proposal is only about case conversion, we should check that there is a good chance to use similar parameter conventions for such extensions.

[This proposal is based on research done together with my student Kimihito Matsui.]

Files

CaseConversion.pdf (340 KB) CaseConversion.pdf

duerst (Martin Dürst), 07/23/2014 11:05 AM

Related issues 7 (2 open — 5 closed)

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#1 [ruby-core:63966]

Related to Bug #3376: russian support added

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#2 [ruby-core:63969]

Related to Feature #2034: Consider the ICU Library for Improving and Expanding Unicode Support added

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#3 [ruby-core:63968]

Related to Feature #10002: String swapcase added

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#4 [ruby-core:63971]

File CaseConversion.pdf CaseConversion.pdf added

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#5 [ruby-core:64046]

Assignee set to duerst (Martin Dürst)

I want default case conversion should be Unicode aware (when encoding is Unicode).
The previous behavior can be done by str.downcase(:ascii).

Non unicode encoding (e.g. Latin-1) can support non ASCII case conversion, but not mandatory.

Matz.

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#6 [ruby-core:64467]

Related to Feature #10152: String#strip doesn't remove non-breaking space added

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#7 [ruby-core:65153]

Target version set to Ruby 2.3.0

Updated by duerst (Martin Dürst) over 11 years ago Actions
Copy link
#8 [ruby-core:67246]

Related to Bug #10550: Resolv::DNS.getaddresses returns no IPs when nameserver returns in differing case than query added

Updated by akr (Akira Tanaka) over 11 years ago Actions
Copy link
#9 [ruby-core:67254]

The related issue, [Bug #10550] Resolv::DNS.getaddresses, needs ASCII-only case conversion.
Unicode aware case conversion is not suitable for the issue.
See RFC 4343.

Updated by duerst (Martin Dürst) about 11 years ago Actions
Copy link
#10

Has duplicate Bug #11284: String#upcase and String#downcase don't work for accented characters added

Updated by duerst (Martin Dürst) almost 9 years ago Actions
Copy link
#11 [ruby-core:83302]

Status changed from Open to Closed

Close way overdue, should have happened somewhere around r55281.

Updated by hsbt (Hiroshi SHIBATA) over 4 years ago Actions
Copy link
#12

Project changed from 14 to Ruby

Updated by duerst (Martin Dürst) over 3 years ago Actions
Copy link
#13

Related to Feature #19317: Unicode ICU Full case mapping added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #10085

Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#1 [ruby-core:63966]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#2 [ruby-core:63969]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#3 [ruby-core:63968]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#4 [ruby-core:63971]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#5 [ruby-core:64046]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#6 [ruby-core:64467]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#7 [ruby-core:65153]

Updated by duerst (Martin Dürst) over 11 years ago Actions
Copy link
#8 [ruby-core:67246]

Updated by akr (Akira Tanaka) over 11 years ago Actions
Copy link
#9 [ruby-core:67254]

Updated by duerst (Martin Dürst) about 11 years ago Actions
Copy link
#10

Updated by duerst (Martin Dürst) almost 9 years ago Actions
Copy link
#11 [ruby-core:83302]

Updated by hsbt (Hiroshi SHIBATA) over 4 years ago Actions
Copy link
#12

Updated by duerst (Martin Dürst) over 3 years ago Actions
Copy link
#13

Related to Ruby - Bug #3376: russian support	Closed	naruse (Yui NARUSE)	Actions
Related to Ruby - Feature #2034: Consider the ICU Library for Improving and Expanding Unicode Support	Rejected	naruse (Yui NARUSE)	Actions
Related to Ruby - Feature #10002: String swapcase	Closed		Actions
Related to Ruby - Feature #10152: String#strip doesn't remove non-breaking space	Open		Actions
Related to Ruby - Bug #10550: Resolv::DNS.getaddresses returns no IPs when nameserver returns in differing case than query	Closed		Actions
Related to Ruby - Feature #19317: Unicode ICU Full case mapping	Assigned	duerst (Martin Dürst)	Actions
Has duplicate Ruby - Bug #11284: String#upcase and String#downcase don't work for accented characters	Rejected		Actions

Project

General

Profile

Ruby

Custom queries

Feature #10085

Add non-ASCII case conversion to String#upcase/downcase/swapcase/capitalize

Updated by duerst (Martin Dürst) almost 12 years ago ActionsCopy link #1 [ruby-core:63966]

Updated by duerst (Martin Dürst) almost 12 years ago ActionsCopy link #2 [ruby-core:63969]

Updated by duerst (Martin Dürst) almost 12 years ago ActionsCopy link #3 [ruby-core:63968]

Updated by duerst (Martin Dürst) almost 12 years ago ActionsCopy link #4 [ruby-core:63971]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago ActionsCopy link #5 [ruby-core:64046]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago ActionsCopy link #6 [ruby-core:64467]

Updated by duerst (Martin Dürst) almost 12 years ago ActionsCopy link #7 [ruby-core:65153]

Updated by duerst (Martin Dürst) over 11 years ago ActionsCopy link #8 [ruby-core:67246]

Updated by akr (Akira Tanaka) over 11 years ago ActionsCopy link #9 [ruby-core:67254]

Updated by duerst (Martin Dürst) about 11 years ago ActionsCopy link #10

Updated by duerst (Martin Dürst) almost 9 years ago ActionsCopy link #11 [ruby-core:83302]

Updated by hsbt (Hiroshi SHIBATA) over 4 years ago ActionsCopy link #12

Updated by duerst (Martin Dürst) over 3 years ago ActionsCopy link #13

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#1 [ruby-core:63966]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#2 [ruby-core:63969]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#3 [ruby-core:63968]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#4 [ruby-core:63971]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#5 [ruby-core:64046]

Updated by matz (Yukihiro Matsumoto) almost 12 years ago Actions
Copy link
#6 [ruby-core:64467]

Updated by duerst (Martin Dürst) almost 12 years ago Actions
Copy link
#7 [ruby-core:65153]

Updated by duerst (Martin Dürst) over 11 years ago Actions
Copy link
#8 [ruby-core:67246]

Updated by akr (Akira Tanaka) over 11 years ago Actions
Copy link
#9 [ruby-core:67254]

Updated by duerst (Martin Dürst) about 11 years ago Actions
Copy link
#10

Updated by duerst (Martin Dürst) almost 9 years ago Actions
Copy link
#11 [ruby-core:83302]

Updated by hsbt (Hiroshi SHIBATA) over 4 years ago Actions
Copy link
#12

Updated by duerst (Martin Dürst) over 3 years ago Actions
Copy link
#13