Bug #14363
closedeach_grapheme_cluster.size returns the wrong size
Description
Ruby 2.5 adds String#each_grapheme_cluster
to enumerate the string's grapheme clusters:
str = "a\u0300i\u0301" #=> "àí"
str.each_grapheme_cluster.to_a #=> ["à", "í"]
Unfortunately, the enumerator's size
doesn't work as expected:
str.each_grapheme_cluster.size #=> 4
The source code reveals that it invokes rb_str_each_char_size
, so it is equivalent to each_char.size
:
static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
return rb_str_enumerate_grapheme_clusters(str, 0);
}
If the grapheme enumerator's size cannot be calculated lazily, each_grapheme_cluster.size
should return nil
to indicate that.
Files
Updated by hugopeixoto (Hugo Peixoto) almost 7 years ago
- File each_grapheme_cluster_size_nil.patch each_grapheme_cluster_size_nil.patch added
- File each_grapheme_cluster_size_real.patch each_grapheme_cluster_size_real.patch added
Calculating the enumerator size here requires iterating through the whole text and do grapheme detection on all bytes, so I'm not sure what's the right approach.
I'm attaching two patches, one that makes it return nil
and one that does the actual count. Both patches have tests attached.
Updated by naruse (Yui NARUSE) almost 7 years ago
- Status changed from Open to Closed
Applied in changeset trunk|r62892.
fix each_grapheme_cluster's size [Bug #14363]
From: Hugo Peixoto hugo.peixoto@gmail.com
Updated by naruse (Yui NARUSE) almost 7 years ago
- Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED
Updated by naruse (Yui NARUSE) almost 7 years ago
- Backport changed from 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: DONE
ruby_2_5 r62896 merged revision(s) 62892,62893.