Project

General

Profile

Actions

Feature #22082

open

Introduce Bit Operations into String

Feature #22082: Introduce Bit Operations into String
1

Added by hasumikin (hitoshi hasumi) about 5 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-dev:<unknown>]

Description

Relevant tickets

  • Introduce #bit_count method on Integer --- #20163

Abstract

Ruby's String is already a byte sequence, but it lacks high-level bit operations.
As a result, packed binary data must be handled with manual byte arithmetic.

For example, checking set-bit at offset 10 currently requires calculating the byte position and the bit's position within that byte:

data = "\xAA\xAA\xAA\xAA"

# The "classic" way
byte_offset = 10 / 8
byte = data.getbyte(byte_offset)
bit_offset = 10 % 8
((byte >> bit_offset) & 1) == 1 #=> false

# The "concise" way using Integer#[]
data.getbyte(10 / 8)[10 % 8] == 1 #=> false

The concise form is a single line, but it still leaks implementation details into every call site: the caller writes 10 / 8 and 10 % 8 by hand (a recurring source of off-by-one errors at byte boundaries), and the Integer result must be compared against 1 to be used as a boolean. With a bit-addressed API, data.bit_at(10) takes a bit position and returns true/false directly. The cost compounds for iteration, run-length scanning, or splicing, where each operation otherwise needs its own byte/bit arithmetic.

I propose native bit-level APIs, treating the String class as a first-class bit sequence:

data = "\xAA\xAA\xAA\xAA"
data.bit_at(10) #=> false

By providing a flat bit-addressing model that handles the underlying bit-to-byte mapping, we allow developers to focus on the logical layout of their data (e.g., an Apache Arrow bitmap or a pixel buffer), making the code more readable and less error-prone.

This proposal presents a family of methods for bit-oriented use of String, organized into groups below.
The immediate goal is agreement on the overall direction and feedback on which subset should be pursued first.

Presenting the full menu matters because some design questions only become clear at that level:

  • bit numbering keyword (lsb_first:) and its consistent application across methods
  • naming symmetry such as bits / each_bit
  • behavior for out-of-range bit indices

Implementation (Prototype)

You can try an actual working implementation with gem install string_bits.
The source code can be seen in https://github.com/hasumikin/string_bits

The reason I want to include this feature in the Ruby core rather than a third-party gem is that I want this API to be common across Ruby implementations such as mruby and Spinel.

Proposed Methods

Full prototype and documentation:
https://github.com/hasumikin/string_bits/blob/0.2.0/docs/ProposedMethods.md

Read

  • bit_at(bit_offset, lsb_first: true) -> true | false | nil -- read a single bit
  • bit_count -> Integer -- count of set-bits (popcount)
    bit_count(bit_offset, bit_length, lsb_first: true) -> Integer
    bit_count(bit_range, lsb_first: true) -> Integer
  • bit_run_count(bit, bit_offset, lsb_first: true) -> Integer | nil -- length of the run of bit starting at bit_offset

Iterator

  • each_bit(start_offset=0, lsb_first: true) { |bool| ... } -> self -- yield each bit as true/false
    each_bit(start_offset=0, lsb_first: true) -> Enumerator
  • bits(start_offset=0, lsb_first: true) -> Array -- Array form of each_bit
    bits(start_offset=0, lsb_first: true) { |bool| ... } -> self
  • each_bit_run(start_offset=0, lsb_first: true) { |bool, offset, len| } -> self -- yield (bool, offset, run_length) pairs
    each_bit_run(start_offset=0, lsb_first: true) -> Enumerator
  • bit_runs(start_offset=0, lsb_first: true) -> Array -- Array form of each_bit_run
    bit_runs(start_offset=0, lsb_first: true) { |bool, len| } -> self
  • each_bit_offset(bit, start_offset=0, lsb_first: true) { |offset| ... } -> self -- yield position of each bit equal to bit
    each_bit_offset(bit, start_offset=0, lsb_first: true) -> Enumerator
  • bit_offsets(bit, start_offset=0, lsb_first: true) -> Array -- Array form of each_bit_offset
    bit_offsets(bit, start_offset=0, lsb_first: true) { |offset| ... } -> self

Mutation

  • bit_set(bit_offset, bit_length=1, lsb_first: true) -> self -- set one bit or a logical bit bit_range to 1
    bit_set(bit_range, lsb_first: true) -> self
  • bit_clear(bit_offset, bit_length=1, lsb_first: true) -> self -- set one bit or a logical bit bit_range to 0
    bit_clear(bit_range, lsb_first: true) -> self
  • bit_flip(bit_offset, bit_length=1, lsb_first: true) -> self -- toggle one bit or a logical bit bit_range
    bit_flip(bit_range, lsb_first: true) -> self
  • bit_splice(bit_offset, bit_length, str, str_bit_offset=0, lsb_first: true) -> self -- write a sub-sequence of bits in place (bit-granularity bytesplice)
    bit_splice(bit_range, str, str_bit_offset=0, lsb_first: true) -> self

Slice

  • bit_slice(bit_offset, bit_length, lsb_first: true) -> String | nil -- extract a sub-sequence of bits (bit-granularity byteslice)
    bit_slice(bit_range, lsb_first: true) -> String | nil

Bitwise

  • bitwise_not -> String / bitwise_not! -> self -- invert every bit
  • bitwise_and(other) -> String / bitwise_and!(other) -> self -- bitwise AND
  • bitwise_or(other) -> String / bitwise_or!(other) -> self -- bitwise OR
  • bitwise_xor(other) -> String / bitwise_xor!(other) -> self -- bitwise XOR

Performance

This is not only about convenience.
In a prototype implementation (string_bits gem), bulk operations such as bitwise_and, bitwise_or, and bit_count are also substantially faster than Ruby-level loops over bytes (see the Benchmark link below).
I do not think performance alone is the reason to add the feature, but it is a practical benefit.

Notes

Benchmarks, discussion, and prior art:

No data to display

Actions

Also available in: PDF Atom