Bug #18972: String#byteslice should return BINARY (aka ASCII-8BIT) Strings - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #18972

closed

String#byteslice should return BINARY (aka ASCII-8BIT) Strings

Added by byroot (Jean Boussier) almost 3 years ago. Updated almost 3 years ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

Backport:

2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN

[ruby-core:109641]

Description

While working on implementing https://bugs.ruby-lang.org/issues/13626, I noticed byteslice assign the receiver encoding to the returned String.

I believe this is incorrect, as since you are doing a byte based operation, you do expect a binary string in return, otherwise if you'd call it on an UTF-8 string, you'd likely get a string with invalid encoding.

I read the original feature request and there's no mention of what the returned encoding should be: https://bugs.ruby-lang.org/issues/4447

Current behavior¶

>> "fée".byteslice(1).valid_encoding?
=> false
>> "fée".byteslice(1).encoding
=> #<Encoding:UTF-8>

Expected behavior¶

>> "fée".byteslice(1).valid_encoding?
=> true
>> "fée".byteslice(1).encoding
=> #<Encoding:ASCII-8BIT>

Backward compatibility concerns¶

I'm honestly not quite sure what the backward incompatibility impact may be.

From my point of view if you are calling byteslice it's to use it with other binary string, but it's indeed
possible that there is existing code mixing UTF-8 and BINARY that somewhat work and would be broken by this change.

Especially since binary strings can silently be promoted from BINARY to UTF-8:

buffer = "".b 
buffer << "fée" # buffer was promoted to Encoding::UTF-8 silently
buffer << "fée".byteslice(1)

The above currently "works", but would raise Encoding::CompatibilityError with this change.

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by byroot (Jean Boussier) almost 3 years ago

Related to Feature #4447: add String#byteslice() method added

Actions

Copy link

Updated by byroot (Jean Boussier) almost 3 years ago

Related to Feature #13626: Add String#byteslice! added

Actions

Copy link

#3 [ruby-core:109642]

Updated by Eregon (Benoit Daloze) almost 3 years ago

I think the current behavior is better, String#byteslice is not only used for BINARY strings.
In fact for binary strings (and other fixed-width encodings), there is no point to use byteslice over slice/[].

For instance, one might work with UTF-8 and get a byte index (instead of a character index), from e.g. String#byteindex or from MatchData#byteoffset, and then one would use byteslice to avoid 2 extra byte offset<->character offset conversions, which e.g. are expensive for (non-7-bit) UTF-8.
What I just described is close to the motivation for #13110 which added String#byteindex.

So I think we cannot change this for compatibility, and it is intended AFAIK.

Actions

Copy link

Updated by byroot (Jean Boussier) almost 3 years ago

Status changed from Open to Rejected

Ok, I suppose your point of view make sense, and either way the backward compatibility concern is just too big.

Closing.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0

	Related to Ruby - Feature #4447: add String#byteslice() method	Closed	matz (Yukihiro Matsumoto)	02/25/2011			Actions
	Related to Ruby - Feature #13626: Add String#byteslice!	Open					Actions

Project

General

Profile

Ruby

Tags

Custom queries

Bug #18972

String#byteslice should return BINARY (aka ASCII-8BIT) Strings

Current behavior¶

Expected behavior¶

Backward compatibility concerns¶

Updated by byroot (Jean Boussier) almost 3 years ago

Updated by byroot (Jean Boussier) almost 3 years ago

Updated by Eregon (Benoit Daloze) almost 3 years ago

Updated by byroot (Jean Boussier) almost 3 years ago