Feature #19314: String#bytesplice should support partial copy - Ruby - Ruby Issue Tracking System

Custom queries

Backport 3.3
Backport 3.4
Backport 4.0
bugs: unassigned
DevMeeting
matz
Open issues with attachment
Windows

Actions

Copy link

Feature #19314

closed

String#bytesplice should support partial copy

Feature #19314: String#bytesplice should support partial copy

Added by shugo (Shugo Maeda) over 3 years ago. Updated about 3 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:111674]

Description

String#bytesplice should support partial copy without temporary String objects.

For example, given x = "0123456789", either of the following replaces the contents of x with "0167856789":

x.bytesplice(2, 3, x, 6, 3)
x.bytesplice(2..4, x, 6..8)

Considerations¶

What should be the return value?
- The return value should be the whole source string for performance and consistency with bytesplice(offset, len, s).
Can the source and destination ranges overlap?
- Yes.
Can the source and destination lengths be different?
- Yes.
Can range form and offset/length form be mixed in the source and destination?
- No.
What should happen when any offset doesn't land on character boundary in text strings.
- IndexError should be raised.
Can the length be omitted in the destination?
- Maybe yes, but it may be confusing.

Use cases¶

Gapped buffer implementation for text editors
NAT implementation
- https://twitter.com/kazuho/status/1611279616098070532

Related issues 1 (1 open — 0 closed)

Related to Ruby - Feature #19315: Lazy substrings in CRuby

Open

Actions

Issue # Delay: days Cancel Multiple values allowed (comma separated).

History
Notes
Property changes
Associated revisions

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#1 [ruby-core:111677]

I think this is too hard to read and parse for a human and 5 arguments seems way too much for a core method.
It feels like a full memcpy/arraycopy which I don't think in general is a good idea for String.
The implementation complexity in []= and similar already hurts Ruby too much.

This is probably the 3rd or more workaround I see to have proper lazy substrings in CRuby, i.e., "abcdef"[1..3] must not copy bytes.
That is what needs to be solved (it already works in TruffleRuby).
Yes, it means RSTRING_PTR() might need to allocate to \0-terminate, so be it, it's worth it.

So I am strongly against this, it's a nth workaround for something simpler to solve which is much more helpful in general.

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#2

Related to Feature #19315: Lazy substrings in CRuby added

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#3 [ruby-core:111680]

I agree that this is a workaround and a VM should solve this as an optimization.

But your proposal: Lazy substrings is not a solution because it also creates an object especially for small strings which is embedded in RVALUE.

I agree that this is memcpy/arraycopy.
Therefore this proposal should add a description how large this workaround contributes performance in such use cases as memcpy on Ruby.

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#4 [ruby-core:111683]

naruse (Yui NARUSE) wrote in #note-3:

But your proposal: Lazy substrings is not a solution because it also creates an object especially for small strings which is embedded in RVALUE.

Yes it creates a String instance reusing the same buffer.
That shouldn't cost much compared to copying many bytes.
It should be insignificant on a benchmark with a long string to copy/move, for a short string perf shouldn't matter much anyway (it won't the be bottleneck of the program).

If it's still too much overhead, it sounds like allocations in CRuby need to be better optimized, or escape analysis should be implemented.
Again, those 2 are more general and benefits are much wider than this one method change that would be used for very few Ruby programs and only handles one specific case.

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#5 [ruby-core:111684]

Ah, something I missed though is that with lazy substrings, there would still need to be a copy of the bytes to "unshare" the string when writing to it.
That copy would also be needed if the string was shared before (e.g. with .dup), but that's unknown in our case.
This does depend on how sharing is implemented, maybe CRuby can see it's only String instances sharing that buffer, and actually both strings are involved in this operation and so there is only need to copy the bytes of the substring.

It feels like a full memcpy/arraycopy which I don't think in general is a good idea for String.

To expand on that, I dislike that because it's using String as a byte array.
If anything, such operation should be supported on Array before String.

Now that we have IO::Buffer and there is https://docs.ruby-lang.org/en/master/IO/Buffer.html#method-i-copy, why not use that?

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#6 [ruby-core:111685]

Eregon (Benoit Daloze) wrote in #note-5:

Now that we have IO::Buffer and there is https://docs.ruby-lang.org/en/master/IO/Buffer.html#method-i-copy, why not use that?

So this does what you want I believe:

x = "0123456789"
IO::Buffer.for(x) do |buffer|
  buffer.copy(buffer, 2, 3, 6)
end
p x # => "0167856789"

I think there is no need to change String#bytesplice therefore (there is even not a need for String#bytesplice due to that, which I think we shouldn't have added).
And IO::Buffer seems better suited for byte-buffer-like operations.

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#7 [ruby-core:111720]

That shouldn't cost much compared to copying many bytes.

This proposal shows two use cases: text editor and NAT, which doesn't copy many bytes.

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#8

Status changed from Open to Closed

Applied in changeset git|f7b72462aa27716370c6bea1f2c240983aca9a55.

String#bytesplice should return self

In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.

This change should be included in Ruby 3.2.1.

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#9

Status changed from Closed to Open

Updated by matz (Yukihiro Matsumoto) about 3 years ago Actions
Copy link
#10 [ruby-core:111904]

Accepted.

Matz.

Updated by naruse (Yui NARUSE) about 3 years ago Actions
Copy link
#11

Status changed from Open to Closed

Applied in changeset git|373e62248c9dceb660e95f1cf05fa2a4a469cd64.

merge revision(s) f7b72462aa27716370c6bea1f2c240983aca9a55: [Backport #19356]

    String#bytesplice should return self

    In Feature #19314, we concluded that the return value of String#bytesplice
    should be changed from the source string to the receiver, because the source
    string is useless and confusing when extra arguments are added.

    This change should be included in Ruby 3.2.1.
    ---
     string.c                 | 4 ++--
     test/ruby/test_string.rb | 2 +-
     2 files changed, 3 insertions(+), 3 deletions(-)

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #19314

String#bytesplice should support partial copy

Considerations¶

Use cases¶

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#1 [ruby-core:111677]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#2

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#3 [ruby-core:111680]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#4 [ruby-core:111683]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#5 [ruby-core:111684]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#6 [ruby-core:111685]

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#7 [ruby-core:111720]

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#8

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#9

Updated by matz (Yukihiro Matsumoto) about 3 years ago Actions
Copy link
#10 [ruby-core:111904]

Updated by naruse (Yui NARUSE) about 3 years ago Actions
Copy link
#11

Project

General

Profile

Ruby

Custom queries

Feature #19314

String#bytesplice should support partial copy

Considerations¶

Use cases¶

Updated by Eregon (Benoit Daloze) over 3 years ago ActionsCopy link #1 [ruby-core:111677]

Updated by Eregon (Benoit Daloze) over 3 years ago ActionsCopy link #2

Updated by naruse (Yui NARUSE) over 3 years ago ActionsCopy link #3 [ruby-core:111680]

Updated by Eregon (Benoit Daloze) over 3 years ago ActionsCopy link #4 [ruby-core:111683]

Updated by Eregon (Benoit Daloze) over 3 years ago ActionsCopy link #5 [ruby-core:111684]

Updated by Eregon (Benoit Daloze) over 3 years ago ActionsCopy link #6 [ruby-core:111685]

Updated by naruse (Yui NARUSE) over 3 years ago ActionsCopy link #7 [ruby-core:111720]

Updated by shugo (Shugo Maeda) about 3 years ago ActionsCopy link #8

Updated by shugo (Shugo Maeda) about 3 years ago ActionsCopy link #9

Updated by matz (Yukihiro Matsumoto) about 3 years ago ActionsCopy link #10 [ruby-core:111904]

Updated by naruse (Yui NARUSE) about 3 years ago ActionsCopy link #11

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#1 [ruby-core:111677]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#2

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#3 [ruby-core:111680]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#4 [ruby-core:111683]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#5 [ruby-core:111684]

Updated by Eregon (Benoit Daloze) over 3 years ago Actions
Copy link
#6 [ruby-core:111685]

Updated by naruse (Yui NARUSE) over 3 years ago Actions
Copy link
#7 [ruby-core:111720]

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#8

Updated by shugo (Shugo Maeda) about 3 years ago Actions
Copy link
#9

Updated by matz (Yukihiro Matsumoto) about 3 years ago Actions
Copy link
#10 [ruby-core:111904]

Updated by naruse (Yui NARUSE) about 3 years ago Actions
Copy link
#11