Project

General

Profile

Actions

Bug #14363

closed

each_grapheme_cluster.size returns the wrong size

Added by sos4nt (Stefan Schüßler) over 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin15]
[ruby-core:84887]

Description

Ruby 2.5 adds String#each_grapheme_cluster to enumerate the string's grapheme clusters:

str = "a\u0300i\u0301"          #=> "àí"
str.each_grapheme_cluster.to_a  #=> ["à", "í"]

Unfortunately, the enumerator's size doesn't work as expected:

str.each_grapheme_cluster.size  #=> 4

The source code reveals that it invokes rb_str_each_char_size, so it is equivalent to each_char.size:

static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_grapheme_clusters(str, 0);
}

If the grapheme enumerator's size cannot be calculated lazily, each_grapheme_cluster.size should return nil to indicate that.


Files

each_grapheme_cluster_size_nil.patch (921 Bytes) each_grapheme_cluster_size_nil.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM
each_grapheme_cluster_size_real.patch (3.03 KB) each_grapheme_cluster_size_real.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM
Actions

Also available in: Atom PDF