Feature #14919: Add String#byteinsert - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #14919

closed

Add String#byteinsert

Feature #14919: Add String#byteinsert

Added by aycabta (aycabta .) about 8 years ago. Updated about 2 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:87975]

Description

It's important for multibyte String editing. Unicode grapheme characters sometimes have plural code points. In text editing, software sometimes should add a new code point to an existing grapheme character. String#byteinsert is important for it.

I implemented by pure Ruby in my code.
https://github.com/aycabta/reline/blob/b17e5fd61092adfd7e87d576301e4e19a4d9e6d8/lib/reline/line_editor.rb#L255-L260

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#1

Tracker changed from Bug to Feature
Backport deleted (~~2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN~~)

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#2 [ruby-core:87981]

aycabta (aycabta .) wrote:

It's important for multibyte String editing. Unicode grapheme characters sometimes have plural code points. In text editing, software sometimes should add a new code point to an existing grapheme character. String#byteinsert is important for it.

Can you explain this a bit more? Editing of code points is easily possible with String#[]=; there is no need to use byteinsert.

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#3 [ruby-core:87983]

duerst (Martin Dürst) wrote:

Editing of code points is easily possible with String#[]=; there is no need to use byteinsert.

Input from CLI¶

In CLI tool, all characters come as each of the bytes. All multibyte characters are split. In the middle of a line, a software should use an insertion of a new character but not a replacement.

Yank¶

In the middle of a line, yank manipulation needs #byteinsert for multibyte editing.

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#4 [ruby-core:87988]

aycabta (aycabta .) wrote:

duerst (Martin Dürst) wrote:

Editing of code points is easily possible with String#[]=; there is no need to use byteinsert.

Input from CLI¶

In CLI tool, all characters come as each of the bytes. All multibyte characters are split.

On the lowest level, characters indeed come in as a string of bytes. But it would be wrong to insert individual bytes into a string unless these bytes are also characters. It would just lead to mojibake.

The right thing to do is to collect a (small) number of bytes, check how many bytes are needed to form one or more characters, insert these characters into the string, and keep the remaining bytes for further processing (wait until more bytes arrive so that we get more complete codepoints/characters).

In the middle of a line, a software should use an insertion of a new character but not a replacement.

Insertion of characters can be done with String#[]=.

Yank¶

In the middle of a line, yank manipulation needs #byteinsert for multibyte editing.

I still don't see why. You don't want to insert bytes, you want to insert characters, so that the String is correctly encoded at all times.

Updated by shevegen (Robert A. Heiler) about 8 years ago Actions
Copy link
#5 [ruby-core:87991]

I don't have a specific opinion on the suggestion itself; Martin raised some valid
points, in my opinion. But I wanted to comment on something else.

There have been some suggestions to the developer meeting, as recently as 8 hours
ago; so probably just shortly before the developer meeting started:

https://bugs.ruby-lang.org/issues/14861

This is a very short time frame. I would like to suggest to give a little bit more
time before the developer meeting, so that other people can also comment on the
suggestions. Something like +24 hours or so if it has not yet discussed; I feel
that ~8 hours without any real possibility for a discussion is very, very short.

Updated by noraj (Alexandre ZANNI) over 3 years ago Actions
Copy link
#6 [ruby-core:111586]

Yes a grapheme can be composed of several code points.

An example is variant selector:

irb(main):001:0> a = "\u2665\n\u2764\n\u2665\ufe0f\n\u2764\ufe0f"
=> "♥\n❤\n♥️\n❤️"
irb(main):002:0> puts a
♥
❤                                                
♥️                                               
❤️                                               
=> nil                                           
irb(main):003:0> a.chars
=> ["♥", "\n", "❤", "\n", "♥", "️", "\n", "❤", "️"]

But fortunately, in Ruby, string indices are already mapping characters and not graphemes. So has Martin highlighted, String#[]= already cover all use cases I can think of.

irb(main):007:0> r = "I \u2665 Ruby!"
=> "I ♥ Ruby!"
irb(main):009:0> r[2] = "\u2764\ufe0f"
=> "❤️"
irb(main):010:0> r
=> "I ❤️ Ruby!"

The only thing I could think of String#byteinsert would be to directly mess with UTF-8 encoding to forge invalid encoding on purpose. But such a use case is rare and advanced and so can maybe be handled with pack and unpack rather than creating a new byteinsert method?

irb(main):014:0> r.unpack1('a*')
=> "I \xE2\x9D\xA4\xEF\xB8\x8F Ruby!"

@aycabta (aycabta .) Maybe you could give me a handy example of the usage of String#byteinsert I can't think of?

Updated by ufuk (Ufuk Kayserilioglu) about 2 years ago Actions
Copy link
#7 [ruby-core:118673]

Given that we now have String#bytesplice since Ruby 3.2, these kinds of operations should be possible using "xxxxx".bytesplice(byte_pointer, 0, other) to insert bytes of other at byte_pointer and "xxxxx".bytesplice(byte_pointer, num, "") to remove num bytes at byte_pointer.

Updated by jeremyevans0 (Jeremy Evans) about 2 years ago Actions
Copy link
#8

Status changed from Open to Closed

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #14919

Add String#byteinsert

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#2 [ruby-core:87981]

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#3 [ruby-core:87983]

Input from CLI¶

Yank¶

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#4 [ruby-core:87988]

Input from CLI¶

Yank¶

Updated by shevegen (Robert A. Heiler) about 8 years ago Actions
Copy link
#5 [ruby-core:87991]

Updated by noraj (Alexandre ZANNI) over 3 years ago Actions
Copy link
#6 [ruby-core:111586]

Updated by ufuk (Ufuk Kayserilioglu) about 2 years ago Actions
Copy link
#7 [ruby-core:118673]

Updated by jeremyevans0 (Jeremy Evans) about 2 years ago Actions
Copy link
#8

Project

General

Profile

Ruby

Custom queries

Feature #14919

Add String#byteinsert

Updated by aycabta (aycabta .) about 8 years ago ActionsCopy link #1

Updated by duerst (Martin Dürst) about 8 years ago ActionsCopy link #2 [ruby-core:87981]

Updated by aycabta (aycabta .) about 8 years ago ActionsCopy link #3 [ruby-core:87983]

Input from CLI¶

Yank¶

Updated by duerst (Martin Dürst) about 8 years ago ActionsCopy link #4 [ruby-core:87988]

Input from CLI¶

Yank¶

Updated by shevegen (Robert A. Heiler) about 8 years ago ActionsCopy link #5 [ruby-core:87991]

Updated by noraj (Alexandre ZANNI) over 3 years ago ActionsCopy link #6 [ruby-core:111586]

Updated by ufuk (Ufuk Kayserilioglu) about 2 years ago ActionsCopy link #7 [ruby-core:118673]

Updated by jeremyevans0 (Jeremy Evans) about 2 years ago ActionsCopy link #8

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#1

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#2 [ruby-core:87981]

Updated by aycabta (aycabta .) about 8 years ago Actions
Copy link
#3 [ruby-core:87983]

Updated by duerst (Martin Dürst) about 8 years ago Actions
Copy link
#4 [ruby-core:87988]

Updated by shevegen (Robert A. Heiler) about 8 years ago Actions
Copy link
#5 [ruby-core:87991]

Updated by noraj (Alexandre ZANNI) over 3 years ago Actions
Copy link
#6 [ruby-core:111586]

Updated by ufuk (Ufuk Kayserilioglu) about 2 years ago Actions
Copy link
#7 [ruby-core:118673]

Updated by jeremyevans0 (Jeremy Evans) about 2 years ago Actions
Copy link
#8