Feature #22118
openIntroduce Basic Bit Operations into String
Description
This PR implements a subset of the specification proposed in the parent feature ticket #22082
PR URL: https://github.com/ruby/ruby/pull/17353
Methods implemented with the same specification as the parent ticket¶
String#bit_at(offset, lsb_first: true) -> true | false | nilString#bitwise_not -> StringString#bitwise_not! -> selfString#bitwise_and(other) -> StringString#bitwise_and!(other) -> selfString#bitwise_or(other) -> StringString#bitwise_or!(other) -> selfString#bitwise_xor(other) -> StringString#bitwise_xor!(other) -> self
Methods with range-oriented forms omitted for now¶
The parent ticket also included range-oriented forms for some methods. In this PR, those range arguments are intentionally omitted, and the methods are limited to simpler forms. Range support can be considered as a future extension.
String#bit_count -> IntegerString#bit_set(offset, lsb_first: true) -> selfString#bit_clear(offset, lsb_first: true) -> selfString#bit_flip(offset, lsb_first: true) -> self
In particular, String#bit_count currently takes no arguments. It does not accept a range, offset/length pair, or lsb_first: keyword.
Remaining design questions even after narrowing the scope¶
The overall direction of adding these methods has already been approved by Matz. The remaining questions are mostly about naming and API details.
1. Why the bitwise_* prefix is used¶
In the parent ticket, all other methods are related to bit positions: they either take bit-position arguments, yield bit positions, or operate on a specified bit range. In contrast, the bitwise_* methods are whole-data operations. They apply to the entire string as a fixed-size bitmap and do not expose bit positions or bit ordering. The bitwise_* prefix is intended to make this distinction explicit.
2. Bit numbering¶
lsb_first: true (default)¶
Within each byte, offset = 0 is the LSB. Numbering proceeds upward through byte[0] and then continues at the LSB of byte[1]:
byte[0] byte[1]
offset: 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
bit: b7 b6 b5 b4 b3 b2 b1 b0 b7 b6 b5 b4 b3 b2 b1 b0
^ ^
LSB LSB
offset = byte_index * 8 + bit_in_byte
lsb_first: false for MSB-first¶
Byte order is preserved (byte[0] is still first), but within each byte numbering starts at the MSB:
byte[0] byte[1]
offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
bit: b7 b6 b5 b4 b3 b2 b1 b0 b7 b6 b5 b4 b3 b2 b1 b0
^ ^
MSB MSB
offset = byte_index * 8 + (7 - bit_in_byte)
- Whether
lsb_first:is needed at all, or whether an equivalent mechanism should exist under another name? - If such a mechanism is needed, whether
lsb_first:is the right keyword name? - If
lsb_first:is acceptable, should its default betrue?
My point is that lsb_first: is needed because both bit numbering conventions are used in real byte-buffer formats.
LSB-first is the more common convention for general in-memory bitmap use, including formats and APIs such as Apache Arrow validity bitmaps, ext4 block bitmaps, Roaring bitmaps, Linux/BSD bitmap APIs, and hardware register descriptions.
MSB-first is also needed for domains such as RFC-style network protocol diagrams, PNG low-bit-depth scanlines, BitTorrent bitfields, and some compressed bit streams.
The keyword only controls bit numbering within each byte. Byte order itself is unchanged.
3. Out-of-range behavior¶
String#bit_at returns nil when the bit offset is out of range.
On the other hand, mutating methods such as String#bit_set, String#bit_clear, and String#bit_flip raise IndexError for an out-of-range bit offset. This follows the same general distinction as read-like access versus write-like access: reading a missing position can return nil, but mutating a missing position should not silently do nothing.
4. Negative bit offsets¶
Negative bit offsets are rejected with IndexError.
Although Ruby often uses negative indices to count from the end, combining negative bit offsets with the lsb_first: true/false mechanism would make the numbering rule harder to explain and reason about. The API keeps bit offsets as non-negative flat positions from the beginning of the string.
5. Length mismatch for binary bitwise operations¶
String#bitwise_and, String#bitwise_or, and String#bitwise_xor require both operands to have the same byte size. If the byte sizes differ, they raise ArgumentError.
This avoids implicit truncation or zero-padding. Since these methods operate on strings as fixed-size bitmaps, requiring equal sizes makes the operation explicit and predictable.