Project

General

Profile

Actions

Feature #14975

closed

String#append without changing receiver's encoding

Added by ioquatix (Samuel Williams) almost 4 years ago. Updated 6 months ago.

Status:
Rejected
Priority:
Normal
Target version:
-
[ruby-core:88336]

Description

I'm not sure where this fits in, but in order to avoid garbage and superfluous function calls, is it possible that String#<<, String#concat or the (proposed) String#append can avoid changing the encoding of the receiver?

Right now it's very tricky to do this in a way that doesn't require extra allocations. Here is what I do:

class Buffer < String
	BINARY = Encoding::BINARY
	
	def initialize
		super
		
		force_encoding(BINARY)
	end
	
	def << string
		if string.encoding == BINARY
			super(string)
		else
			super(string.b) # Requires extra allocation.
		end
		
		return self
	end
	
	alias concat <<
end

When the receiver is binary, but contains byte sequences, appending UTF_8 can fail:

"Foobar".b << "Føøbar"
=> "FoobarFøøbar"

> "Føøbar".b << "Føøbar"
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

So, it's not possible to append data, generally, and then call force_encoding(Encoding::BINARY). One must ensure the string is binary before appending it.

It would be nice if there was a solution which didn't require additional allocations/copies/linear scans for what should basically be a memcpy.

See also: https://bugs.ruby-lang.org/issues/14033 and https://bugs.ruby-lang.org/issues/13626#note-3

There are two options to fix this:

1/ Don't change receiver encoding in any case.
2/ Apply 1, but only when receiver is using Encoding::BINARY

Actions

Also available in: Atom PDF