Project

General

Profile

Actions

Feature #22118

open

Introduce Basic Bit Operations into String

Feature #22118: Introduce Basic Bit Operations into String

Added by hasumikin (hitoshi hasumi) 3 days ago. Updated 3 days ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:125775]

Description

This PR implements a subset of the specification proposed in the parent feature ticket #22082

PR URL: https://github.com/ruby/ruby/pull/17353

Methods implemented with the same specification as the parent ticket

  • String#bit_at(offset, lsb_first: true) -> true | false | nil
  • String#bitwise_not -> String
  • String#bitwise_not! -> self
  • String#bitwise_and(other) -> String
  • String#bitwise_and!(other) -> self
  • String#bitwise_or(other) -> String
  • String#bitwise_or!(other) -> self
  • String#bitwise_xor(other) -> String
  • String#bitwise_xor!(other) -> self

Methods with range-oriented forms omitted for now

The parent ticket also included range-oriented forms for some methods. In this PR, those range arguments are intentionally omitted, and the methods are limited to simpler forms. Range support can be considered as a future extension.

  • String#bit_count -> Integer
  • String#bit_set(offset, lsb_first: true) -> self
  • String#bit_clear(offset, lsb_first: true) -> self
  • String#bit_flip(offset, lsb_first: true) -> self

In particular, String#bit_count currently takes no arguments. It does not accept a range, offset/length pair, or lsb_first: keyword.


Remaining design questions even after narrowing the scope

The overall direction of adding these methods has already been approved by Matz. The remaining questions are mostly about naming and API details.

1. Why the bitwise_* prefix is used

In the parent ticket, all other methods are related to bit positions: they either take bit-position arguments, yield bit positions, or operate on a specified bit range. In contrast, the bitwise_* methods are whole-data operations. They apply to the entire string as a fixed-size bitmap and do not expose bit positions or bit ordering. The bitwise_* prefix is intended to make this distinction explicit.

2. Bit numbering

lsb_first: true (default)

Within each byte, offset = 0 is the LSB. Numbering proceeds upward through byte[0] and then continues at the LSB of byte[1]:

        byte[0]                    byte[1]
offset: 7  6  5  4  3  2  1  0     15 14 13 12 11 10 9  8
bit:    b7 b6 b5 b4 b3 b2 b1 b0    b7 b6 b5 b4 b3 b2 b1 b0
                             ^                          ^
                             LSB                        LSB

offset = byte_index * 8 + bit_in_byte

lsb_first: false for MSB-first

Byte order is preserved (byte[0] is still first), but within each byte numbering starts at the MSB:

        byte[0]                    byte[1]
offset: 0  1  2  3  4  5  6  7     8  9  10 11 12 13 14 15
bit:    b7 b6 b5 b4 b3 b2 b1 b0    b7 b6 b5 b4 b3 b2 b1 b0
        ^                          ^
        MSB                        MSB

offset = byte_index * 8 + (7 - bit_in_byte)
  • Whether lsb_first: is needed at all, or whether an equivalent mechanism should exist under another name?
  • If such a mechanism is needed, whether lsb_first: is the right keyword name?
  • If lsb_first: is acceptable, should its default be true?

My point is that lsb_first: is needed because both bit numbering conventions are used in real byte-buffer formats.

LSB-first is the more common convention for general in-memory bitmap use, including formats and APIs such as Apache Arrow validity bitmaps, ext4 block bitmaps, Roaring bitmaps, Linux/BSD bitmap APIs, and hardware register descriptions.

MSB-first is also needed for domains such as RFC-style network protocol diagrams, PNG low-bit-depth scanlines, BitTorrent bitfields, and some compressed bit streams.

The keyword only controls bit numbering within each byte. Byte order itself is unchanged.

3. Out-of-range behavior

String#bit_at returns nil when the bit offset is out of range.

On the other hand, mutating methods such as String#bit_set, String#bit_clear, and String#bit_flip raise IndexError for an out-of-range bit offset. This follows the same general distinction as read-like access versus write-like access: reading a missing position can return nil, but mutating a missing position should not silently do nothing.

4. Negative bit offsets

Negative bit offsets are rejected with IndexError.

Although Ruby often uses negative indices to count from the end, combining negative bit offsets with the lsb_first: true/false mechanism would make the numbering rule harder to explain and reason about. The API keeps bit offsets as non-negative flat positions from the beginning of the string.

5. Length mismatch for binary bitwise operations

String#bitwise_and, String#bitwise_or, and String#bitwise_xor require both operands to have the same byte size. If the byte sizes differ, they raise ArgumentError.

This avoids implicit truncation or zero-padding. Since these methods operate on strings as fixed-size bitmaps, requiring equal sizes makes the operation explicit and predictable.

Actions

Also available in: PDF Atom