Project

General

Profile

Actions

Bug #18972

closed

String#byteslice should return BINARY (aka ASCII-8BIT) Strings

Added by byroot (Jean Boussier) over 1 year ago. Updated over 1 year ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:109641]

Description

While working on implementing https://bugs.ruby-lang.org/issues/13626, I noticed byteslice assign the receiver encoding to the returned String.

I believe this is incorrect, as since you are doing a byte based operation, you do expect a binary string in return, otherwise if you'd call it on an UTF-8 string, you'd likely get a string with invalid encoding.

I read the original feature request and there's no mention of what the returned encoding should be: https://bugs.ruby-lang.org/issues/4447

Current behavior

>> "fée".byteslice(1).valid_encoding?
=> false
>> "fée".byteslice(1).encoding
=> #<Encoding:UTF-8>

Expected behavior

>> "fée".byteslice(1).valid_encoding?
=> true
>> "fée".byteslice(1).encoding
=> #<Encoding:ASCII-8BIT>

Backward compatibility concerns

I'm honestly not quite sure what the backward incompatibility impact may be.

From my point of view if you are calling byteslice it's to use it with other binary string, but it's indeed
possible that there is existing code mixing UTF-8 and BINARY that somewhat work and would be broken by this change.

Especially since binary strings can silently be promoted from BINARY to UTF-8:

buffer = "".b 
buffer << "fée" # buffer was promoted to Encoding::UTF-8 silently
buffer << "fée".byteslice(1)

The above currently "works", but would raise Encoding::CompatibilityError with this change.


Related issues 2 (1 open1 closed)

Related to Ruby master - Feature #4447: add String#byteslice() methodClosedmatz (Yukihiro Matsumoto)02/25/2011Actions
Related to Ruby master - Feature #13626: Add String#byteslice!OpenActions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0