Project

General

Profile

Actions

Misc #17751

closed

Do these instructions (<<,+,[0..n]) modify the original string without creating copies?

Added by stiuna (Juan Gregorio) about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
[ruby-core:103036]

Description

In my program a string increases considerably in size inside a loop, at the end of that loop a header is created that will have to go to the beginning of that string.

During the whole loop:

str << "some data"

At the end:

header = "other data"
str = header + str

I understand that using (+) creates a copy to then modify the original variable, that is not desirable, I would like to do something similar to (<<), which I understand does not create copies.

If I do this:

header << str

I would have two variables with a very large size.

I also have this other code and I don't know if it is an "in place" modifier:

str = "12345"
str[0..2] = ""
#s => 45

In short, I want to know what instructions I should use to remove a given range from a string and how to concatenate to both the beginning and end of the target string without having to create copies.

Updated by xtkoba (Tee KOBAYASHI) about 1 year ago

I understand that in the last case a copy of the content of str is not created, although it can trigger memmove when the assignment changes the length of the string.

BTW is it really necessary to hold the whole data in one string? I would create a new class to hold them, or simply write data = [header, str] or data = {:header => header, :str => str} or something.

Updated by stiuna (Juan Gregorio) about 1 year ago

@xtkoba (Tee KOBAYASHI)

But use this:

#      [header, str]
data = ["", ""]
data[1] << "some data"

Or this:

data = {:header => "", :str => ""}
data[:str] << "some data"

Isn't this slower than just using a string directly?

str << "some data"

Besides then I have to concatenate and to do that in an array or a hash copies must be created.

Updated by xtkoba (Tee KOBAYASHI) about 1 year ago

I would not even concatenate any strings and would push them to an array, as if they were immutable (like in Go language for example).

If you really need to concatenate them and have to care about the speed and the memory efficiency, you can create a C extension and use char[] to manipulate them.

Updated by mame (Yusuke Endoh) about 1 year ago

I would like to do something similar to (<<), which I understand does not create copies.

There is String#prepend method.

str.prepend(header)

I also have this other code and I don't know if it is an "in place" modifier:

str[0..2] = "" is an in-place operation.

Note, however, that these operations (String#prepend and []=) may take a long time.
They do not create another huge string, but they copy the whole content in place, which may take O(n).
If you are worried about only memory consumption, they may work.
But if you want to make your code memory-efficient and fast, they will not solve your issue.

In short, I want to know what instructions I should use to remove a given range from a string and how to concatenate to both the beginning and end of the target string without having to create copies.

In general, it is difficult to do all operations efficiently. If you explain your real problem, we may propose a good solution.

Updated by stiuna (Juan Gregorio) about 1 year ago

xtkoba (Tee KOBAYASHI) wrote in #note-3:

I would not even concatenate any strings and would push them to an array, as if they were immutable (like in Go language for example).

But in the end that information will have to be written to a file so I will have to use 'join' and you see, copies are made.

mame (Yusuke Endoh) wrote in #note-4:

Note, however, that these operations (String#prepend and []=) may take a long time.
They do not create another huge string, but they copy the whole content in place, which may take O(n).

When you say "may take a long time" does it apply even if 'str' is 3GB and 'header' is only 5bits? or do you mean only when both variables are large?

Another solution to my problem would be to be able to write data to a file from any position I want (in bits) by replacing its content.

For example if the binary of a .txt is the following:

"00101111"

And I say:

file.writeSince(5, "00111100011")

The .txt file would change to this:

"00101001_11100011"

That way the last 3 bits of the .txt file were overwritten with the first 3 bits that 'writeSince' received, with this I don't need to concatenate to 'header' and 'str'.

Only in the future concatenation will be mandatory but in the short term what I said above is my priority.

Updated by xtkoba (Tee KOBAYASHI) about 1 year ago

There is no need to join when you write strings to a file:

data = {:header => "header", :payload => ["foo", "bar"]}
File.open("datafile", "w") do |f|
  f.print data[:header]
  data[:payload].each do |s|
    f.print s
  end
end

Seeking with offset bits not a multiple of 8 is beyond my handling, and I will leave it to other persons to answer.

Updated by stiuna (Juan Gregorio) about 1 year ago

@xtkoba (Tee KOBAYASHI)

Hey thanks, your code gave me a great idea, with some modifications I can do what I needed. Very helpful.

(Just in case the 'writeSince' thing is still an open question, hopefully someone can help with that).

Updated by mame (Yusuke Endoh) about 1 year ago

  • Status changed from Open to Closed

When you say "may take a long time" does it apply even if 'str' is 3GB and 'header' is only 5bits?

Yes, it does. String#prepend need to memmove the content of str, which may take time proportional to the length of str.

Another solution to my problem would be to be able to write data to a file from any position I want (in bits) by replacing its content.

Well, do you really want to deal with bits, not bytes? If so, you cannot simply use Strings because they are byte-oriented (or character-oriented).

From only your statements, I think you need to implement a bit stream. String#getbyte and String#setbyte may be helpful to do that.
(BTW, your question looks like https://en.wikipedia.org/wiki/XY_problem to me. I recommend you to state what you want to do eventually.)

Anyway, this issue tracker is for bug reports and improvement proposals. If you want to ask a question, please use (See https://www.ruby-lang.org/en/community/mailing-lists/), or Q&A site like stackoverflow.

Updated by stiuna (Juan Gregorio) about 1 year ago

I was looking for a FAQ on the site to see if it was allowed to open topics not related to bugs or improvements but I didn't find anything.

And about manipulating bits in a file I meant it literally, it's just as I said above, I think this would have avoided confusion:

file.writeSince(5, ["00111100011"].pack('B*'))

But I will go to the site you recommended (not StackOverflow), thanks for everything.

Updated by nobu (Nobuyoshi Nakada) about 1 year ago

xtkoba (Tee KOBAYASHI) wrote in #note-6:

  f.print data[:header]
  data[:payload].each do |s|
    f.print s
  end

IO#print accepts multiple arguments, so this code can be f.print data[:header], *data[:payload].

Actions

Also available in: Atom PDF