Bug #20305
closedcommit 1d2d25dcadda0764f303183ac091d0c87b432566 breaks grapheme_clusters
Description
given a script:
#script.rb
p "안녕".byteslice(0, 4).grapheme_clusters
The commit 1d2d25dcadda0764f303183ac091d0c87b432566 (https://github.com/ruby/ruby/commit/1d2d25dcadda0764f303183ac091d0c87b432566) breaks the grapheme_clusters method on a byte slice
(commit 1d2d25dcadda0764f303183ac091d0c87b432566)
((HEAD detached at 1d2d25dcad)) $ ./ruby --disable=gems script.rb
["안", "\xEB"]
((HEAD detached at 1d2d25dcad)) $ git checkout HEAD^
(114e71d06280f9c57b9859ee4405ae89a989ddb6)
((HEAD detached at 114e71d062)) $ make -j
...
((HEAD detached at 114e71d062)) $ ./ruby --disable=gems script.rb
["안"]
((HEAD detached at 114e71d062)) $ cat script.rb
p "안녕".byteslice(0, 4).grapheme_clusters
the expected result here is almost certainly the latter output, and not the former.
Updated by fablestales (Fable Tales) 9 months ago
fablestales (Fable Tales) wrote:
given a script:
#script.rb p "안녕".byteslice(0, 4).grapheme_clusters
The commit 1d2d25dcadda0764f303183ac091d0c87b432566 (https://github.com/ruby/ruby/commit/1d2d25dcadda0764f303183ac091d0c87b432566) breaks the grapheme_clusters method on a byte slice
(commit 1d2d25dcadda0764f303183ac091d0c87b432566) ((HEAD detached at 1d2d25dcad)) $ ./ruby --disable=gems script.rb ["안", "\xEB"] ((HEAD detached at 1d2d25dcad)) $ git checkout HEAD^ (114e71d06280f9c57b9859ee4405ae89a989ddb6) ((HEAD detached at 114e71d062)) $ make -j ... ((HEAD detached at 114e71d062)) $ ./ruby --disable=gems script.rb ["안"] ((HEAD detached at 114e71d062)) $ cat script.rb p "안녕".byteslice(0, 4).grapheme_clusters
the expected result here is almost certainly the latter output, and not the former.
to clarify: grapheme_clusters used to ignore partial characters from a byteslice, now it does not.
Updated by fablestales (Fable Tales) 9 months ago
I added a failing test to reproduce this issue in this PR: https://github.com/ruby/ruby/pull/10103
Updated by nobu (Nobuyoshi Nakada) 9 months ago
- Status changed from Open to Closed
Applied in changeset git|3a04ea2d0379dd8c6623c2d5563e6b4e23986fae.
[Bug #20305] Fix matching against an incomplete character
When matching against an incomplete character, some enclen
calls are
expected not to exceed the limit, and some are expected to return the
required length and then the results are checked if it exceeds.
Updated by byroot (Jean Boussier) 8 months ago
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: REQUIRED
Updated by k0kubun (Takashi Kokubun) 6 months ago
- Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONE
ruby_3_3 72a45ac7a3cc9bbecf641ac505f8ee791c9da48c merged revision(s) 3a04ea2d0379dd8c6623c2d5563e6b4e23986fae.
Updated by nagachika (Tomoyuki Chikanaga) 5 months ago
- Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONE to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE, 3.3: DONE
ruby_3_2 a67b43d99e24dc7c2a9e134a65f28f968fe124c1 merged revision(s) 3a04ea2d0379dd8c6623c2d5563e6b4e23986fae.