Feature #21982
openAdd `Decimal` as a core numeric class
Description
Feature: Add Decimal as a core numeric class¶
Abstract¶
Add Decimal < Numeric to Ruby core: exact base-10 arithmetic using a tagged immediate VALUE (like Fixnum) for small values, promoting to a 128-bit heap object for larger values.
Background¶
Ruby apps that handle money, tax rates or measurements often use Integers with extra business logic since Float can't represent most base-10 fractions exactly:
0.1 + 0.2 == 0.3 #=> false
0.1 + 0.2 #=> 0.30000000000000004
0.1d + 0.2d == 0.3d #=> true
0.1d + 0.2d #=> 0.3d
Alternatives have tradeoffs:
- BigDecimal: correct but 8x slower than Float on compound interest.
-
Rational: correct but equally slow, and
Rational("19.99").to_sgives"1999/100". - Integer cents: correct and fast but pushes formatting and decimal-point tracking into application code.
Proposal¶
# Literal syntax
price = 19.99d
tax_rate = 0.0875d
total = (price * (1d + tax_rate)).round(2) #=> 21.74d
# Kernel converter (like Integer(), Float())
Decimal("29.99") #=> 29.99d
Decimal(42) #=> 42.0d
# Value semantics: frozen, Ractor-shareable
19.99d.frozen? #=> true
# Full numeric protocol
19.99d + 1 #=> 20.99d
19.99d <=> 20.0d #=> -1
19.99d.round(1) #=> 20.0d
# Human-focused string interpolation
"$#{19.99d}" #=> "$19.99"
"$#{BigDecimal("19.99").to_s("F")}" #=> "$19.99"
Features¶
-
dliteral suffix:42d,3.14d,0.1d(matchingrfor Rational,ifor Complex) - Frozen and Ractor-shareable: value semantics like Rational
- 18 decimal places: fixed precision, full signed 128-bit range
-
Kernel#Decimal()converter: withexception: falsesupport -
Full numeric protocol: arithmetic, comparison, coercion, rounding,
to_i/to_f/to_r/to_s, pattern matching
Performance¶
Apple M4 with YJIT. All values pre-allocated outside the measurement loop.
Compound interest: 360 monthly iterations¶
balance = (balance * (1 + rate)).round(2) repeated 360 times. A tight loop of multiply, add, round. Both Decimal and Float produce $60,225.61.
| Type | YJIT | No JIT |
|---|---|---|
| Decimal (BID) | 93K i/s | 55K i/s |
| Float | 83K i/s | 60K i/s |
| Rational | 10.7K i/s | 9.4K i/s |
| BigDecimal | 10.1K i/s | 9.3K i/s |
With YJIT, Decimal is 1.12x faster than Float on compound interest. Without YJIT, Float is 1.1x faster. YJIT helps Decimal more (1.7x speedup vs Float's 1.4x) because BOPs and unchecked entry points skip the per-call type checks. Rational and BigDecimal are ~9x slower than Decimal either way.
Per-operation (benchmark-driver, YJIT)¶
| Operation | Decimal (BID) | Float | Ratio |
|---|---|---|---|
| add | 147M i/s | 160M i/s | 1.09x slower |
| mul | 159M i/s | 158M i/s | ~parity |
| round(2) | 118M i/s | 78M i/s | 1.5x faster |
| div (inexact) | 49M i/s | 140M i/s | 2.9x slower |
| parse | 34M i/s | 34M i/s | parity |
| to_s | 32M i/s | 9M i/s | 3.4x faster |
| sum(1000) | 1.27M i/s | 794K i/s | 1.6x faster |
Add and mul are near Float parity. Round is 1.5x faster, to_s 3.4x faster. Division is 2.9x slower (inexact results need wide arithmetic).
Design¶
Two-tier storage, mirroring Fixnum/Bignum:
Significand <= 2^51 - 1: 8 bytes, no allocation
63 62 12 11 8 7 0
+---+------------------------+-----+-------+
| 0 | 1999 | 2 | 0x84 |
+---+------------------------+-----+-------+
sign significand (51 bits) scale tag
64 bits encode sign, significand, decimal position and type tag.
The value IS the VALUE, like Fixnum. All 15-digit significands fit.
Some 16-digit significands fit (up to 2,251,799,813,685,247).
Significand > 2^51 - 1: heap allocated
+--------+ +----------------+----------------+--------------------------------+
| ptr | --> | flags + klass | value * 10**18 |
+--------+ | 16 bytes | 16 bytes |
VALUE +----------------+----------------+--------------------------------+
8 bytes object header full i128 range, 18 decimal places
Standard Ruby object header with embedded i128 payload.
Decimal("12.34") is an immediate. No object, no allocation, no GC.
Decimal("9_999_999_999_999_999.99") promotes to heap (significand exceeds 51 bits).
Decimal("123_456_789_012_345_678_901_234_567_890_123_456.78") raises RangeError (exceeds 128-bit range).
Optimization layers¶
The prototype implements analogous layers to Float and Integer:
- 13 BOPs with
DECIMAL_REDEFINED_OP_FLAG - Interpreter fast paths in
vm_opt_plus/minus/mult/div/mod,vm_opt_lt/le/gt/ge,opt_equality_specialized - YJIT
Type::Decimalwith inline BID add/sub and BOP guard paths - ZJIT
types::Decimalwith profiler support and method annotations - Unchecked
_ddentry points for YJIT and interpreter - Reciprocal lookup tables for division-free scale reduction
Heap arithmetic¶
Heap multiply and divide use 256-bit widening (same algorithm as Roc). Optional fast paths exploit the fact that SCALE (10^18) fits in u64: schoolbook two-division wide_div, single-operand wide_mul_64 and Barrett reduction for the u128 case. These improve heap multiply by ~25% and heap division by ~50%. All are removable without affecting correctness or BID performance.
Type coercion¶
When Decimal interacts with other numeric types:
1.5d + 1 #=> 2.5d (Integer promotes to Decimal)
1.5d + 0.5 #=> 2.0 (Decimal demotes to Float)
1.5d + 1/4r #=> 1.75d (Rational promotes to Decimal)
1.5d + 1/3r # ArgumentError (1/3 exceeds 18 decimal places)
1.5d == 1.5 #=> true (compared via Rational)
1.5d == 3/2r #=> true (Rational comparison via <=>)
Decimal + Integer returns Decimal (lossless). Decimal + Float returns Float (caller chose approximate arithmetic). Decimal + Rational returns Decimal when the Rational is exactly representable in 18 decimal places, raises ArgumentError otherwise. Conversion is exact. Only arithmetic results (*, /) truncate.
Relationship to BigDecimal¶
Decimal and BigDecimal serve different needs:
- Decimal: fixed precision (18 places), core type, immediate encoding, JIT-optimized. For the common case: prices, percentages, measurements.
- BigDecimal: arbitrary precision, bundled gem, heap-allocated. For when you need more than 18 decimal places or unbounded digit counts.
They can coexist. The Decimal conversion method is to_dec to avoid conflict with bigdecimal/util, which defines to_d. If to_d can be shared or BigDecimal's deprecated, to_d would be more natural.
Why two Decimal tiers¶
Intel's BID64 gives a 64-bit immediate with 16 digits. Roc's Dec gives a 128-bit fixed-point value with 39 digits. Both are proven designs.
Ruby's approach combines them, the same way Integer combines Fixnum and Bignum. Small values are immediates, large ones promote to heap. Transparent to the programmer.
Two simpler alternatives are also viable:
- BID-only: 15-16 digits (51-bit significand), zero allocation. Operations exceeding the BID range would raise. Half the code.
- i128-only: 39 digits, one allocation per decimal. No dual paths. Simpler but slower with GC churn.
Design details¶
- Fixed 18 decimal places: 10^18 fits in a 64-bit integer, keeping the SCALE factor cheap for multiplication and division. 18 places cover all ISO 4217 currency subdivisions.
-
Truncation toward zero for
*and/: consistent with C integer division. Floored division for%,div,divmod(matching Ruby's Integer). -
Exact input conversion:
Decimal("1e-19")raisesArgumentErrorbecause the value cannot be represented in 18 decimal places.Decimal("1e-19", exception: false)returnsnil. Trailing zeros beyond 18 places are accepted:Decimal("1.10000000000000000000")is1.1d. Arithmetic truncation is separate and expected. -
Float conversion via
Float#to_sthen parse:Decimal(0.1)gives0.1d, not0.1000000000000000055...d. -
0dis a Decimal literal:0dproducesDecimal(0).0d42remainsInteger(42)(the existing decimal-integer prefix).0D42also remainsInteger(42)(only lowercasedproduces Decimal). - Frozen and Ractor-shareable: like Rational. No mutable state.
Portability¶
The prototype requires __int128 (GCC and Clang). For the heap variant, MSVC would need a two-word i128 emulation or a pure-C fallback using int64_t hi, lo fields along with appropriate operations. The BID immediate tier (64-bit only) works everywhere.
Scope¶
Implementation (decimal.c, decimal.rb), VM fast paths, YJIT and ZJIT type tracking and codegen, serialization, Kernel converters and prism_compile.c.
The d literal suffix requires a small Prism upstream change (~60 lines in prism.c plus regenerated sources). Psych would need a separate patch for YAML serialization. Both would be submitted as upstream PRs if this proposal is accepted.
Gem¶
A gem version provides the same semantics as a C extension with pure Ruby fallback. It gets 14.2K i/s on compound interest with YJIT, versus core Decimal's 93K (6.5x slower). A gem cannot add VALUE tag bytes, register BOPs or teach YJIT new types, so it must heap-allocate every result and go through full method dispatch.
Related work¶
| Language | Type | Encoding | Precision | Normalized |
|---|---|---|---|---|
| Intel libbid | BID64 | 1+13+50 combination field | 16 digits | no (cohorts) |
| Intel libbid | BID128 | 1+17+110 floating-point | 34 digits | no (cohorts) |
| Roc | Dec | i128 fixed-point (* 10^18) | 39 digits | n/a (fixed scale) |
| C# | System.Decimal | 96-bit sig + 5-bit scale | 28-29 digits | no |
| Ruby | Decimal (this) | 51-bit immediate + i128 heap | 15-16 digits (BID), full i128 (heap) | yes (canonical) |
Immediate tier vs Intel BID64: Intel's combination-field encoding gets 16 digits from 64 bits (vs our 15-16) by implicitly encoding the leading significand digit. The cost is decoder complexity and unnormalized cohorts. 1.0 and 1.00 have different bit patterns, requiring rescaling for equality. Our BID encoding strips trailing zeros for a canonical form: equal immediates are always identical bit patterns, so equality is a single-word comparison. Heap decimals use i128 value comparison.
Heap tier vs Roc Dec: nearly identical design. Same i128 scaled by 10^18, same 256-bit widening for multiply and divide. Our additions: reciprocal lookup tables for division-free scale reduction, Barrett reduction for the SCALE division and promotion to the immediate tier when results fit.
Both tiers vs C# System.Decimal: C# uses a 96-bit significand with variable scale 0-28 in a single 128-bit value type. More precision (28-29 digits) than our immediate but no fast path. All arithmetic operates on three 32-bit words. Not normalized. Ruby doesn't have value types, so C#'s stack-allocation advantage doesn't apply. The tagged immediate achieves the same effect.
Like Integer, the two tiers are invisible in Rubyland. A Decimal is a Decimal.