Project

General

Profile

Bug #14363

each_grapheme_cluster.size returns the wrong size

Added by sos4nt (Stefan Schüßler) 11 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin15]
[ruby-core:84887]

Description

Ruby 2.5 adds String#each_grapheme_cluster to enumerate the string's grapheme clusters:

str = "a\u0300i\u0301"          #=> "àí"
str.each_grapheme_cluster.to_a  #=> ["à", "í"]

Unfortunately, the enumerator's size doesn't work as expected:

str.each_grapheme_cluster.size  #=> 4

The source code reveals that it invokes rb_str_each_char_size, so it is equivalent to each_char.size:

static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_grapheme_clusters(str, 0);
}

If the grapheme enumerator's size cannot be calculated lazily, each_grapheme_cluster.size should return nil to indicate that.

each_grapheme_cluster_size_nil.patch (921 Bytes) each_grapheme_cluster_size_nil.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM
each_grapheme_cluster_size_real.patch (3.03 KB) each_grapheme_cluster_size_real.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM

Associated revisions

Revision 613decd0
Added by naruse (Yui NARUSE) 9 months ago

each_grapheme_cluster shouldn't return size [Bug #14363]

From: Stefan Schüßler mail@stefanschuessler.de

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62888 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 62888
Added by naruse (Yui NARUSE) 9 months ago

each_grapheme_cluster shouldn't return size [Bug #14363]

From: Stefan Schüßler mail@stefanschuessler.de

Revision 6e0f5b84
Added by naruse (Yui NARUSE) 9 months ago

Revert "each_grapheme_cluster shouldn't return size [Bug #14363]"

This reverts commit r62887.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62891 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 62891
Added by naruse (Yui NARUSE) 9 months ago

Revert "each_grapheme_cluster shouldn't return size [Bug #14363]"

This reverts commit r62887.

Revision 41b2ef46
Added by naruse (Yui NARUSE) 9 months ago

fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto hugo.peixoto@gmail.com

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62892 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 62892
Added by naruse (Yui NARUSE) 9 months ago

fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto hugo.peixoto@gmail.com

Revision c40df5a7
Added by naruse (Yui NARUSE) 9 months ago

merge revision(s) 62892,62893: [Backport #14363]

fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto <hugo.peixoto@gmail.com>

Factor out get_reg_grapheme_cluster

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_5@62896 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 62896
Added by naruse (Yui NARUSE) 9 months ago

merge revision(s) 62892,62893: [Backport #14363]

fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto <hugo.peixoto@gmail.com>

Factor out get_reg_grapheme_cluster

History

#1 Updated by hugopeixoto (Hugo Peixoto) 9 months ago

Calculating the enumerator size here requires iterating through the whole text and do grapheme detection on all bytes, so I'm not sure what's the right approach.

I'm attaching two patches, one that makes it return nil and one that does the actual count. Both patches have tests attached.

#2 Updated by naruse (Yui NARUSE) 9 months ago

  • Status changed from Open to Closed

Applied in changeset trunk|r62892.


fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto hugo.peixoto@gmail.com

#3 Updated by naruse (Yui NARUSE) 9 months ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED

#4 [ruby-core:86254] Updated by naruse (Yui NARUSE) 9 months ago

  • Backport changed from 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: DONE

ruby_2_5 r62896 merged revision(s) 62892,62893.

Also available in: Atom PDF