Bug #20662
closedpack("g") completely discards any actual NaN value and always packs the same single-precision bytes for a NaN
Description
pack("G")
/unpack("G")
works great with NaN values. However,
-
pack("g")
completely discards any actual NaN value and always packs the same bytes for a NaN
("bug as implemented" inVALUE_to_float
)
Also:
-
unpack("g")
always sets the quiet bit to 1 in theFloat
result
(location of bug not obvious to me)
Files
Updated by nobu (Nobuyoshi Nakada) 5 months ago · Edited
- Description updated (diff)
cabo (Carsten Bormann) wrote:
pack("G")/unpack("G") works great with NaN values. However,
First, Ruby provides only Float::NAN
, and does not consider payloads of NaN values.
pack("g")
completely discards any actual NaN value and always packs the same bytes for a NaN
("bug as implemented" inVALUE_to_float
)
Since "G"
preserves the payload, it may be better to preserve it for "g"
as well, IMO.
unpack("g")
always sets the quiet bit to 1 in theFloat
result
(location of bug not obvious to me)
Ruby's Float
uses double
always.
Although I don't remember about NaN and its conversion in IEEE-754 well, clang on macOS appears not to wrap around it at casts between float
and double
, for instance.
#include <float.h>
#include <stdint.h>
#include <stdio.h>
int main(void)
{
union {
uint32_t b;
float f;
} x;
x.b = 0x7fbff000;
printf("%x\n", x.b);
double f = x.f;
printf("%f\n", f);
x.f = (float)f;
printf("%x\n", x.b);
return 0;
}
$ clang nan-dbl.c && ./a.out
7fbff000
nan
7ffff000
Updated by cabo (Carsten Bormann) 5 months ago
cabo (Carsten Bormann) wrote:
pack("G")/unpack("G") works great with NaN values. However,
First, Ruby provides only
Float::NAN
, and does not consider payloads of NaN values.
Right. The only interface Ruby provides to NaN values is via pack/unpack.
pack("g")
completely discards any actual NaN value and always packs the same bytes for a NaN
("bug as implemented" inVALUE_to_float
)Since
"G"
preserves the payload, it may be better to preserve it for"g"
as well, IMO.
Indeed.
The code in pack.c/VALUE_to_float for NaNs uses “return NAN” which discards the value.
This needs to be replaced with something like the “return d” further down in pack.c (which implies a conversion from Ruby’s double to the float we need for pack(‘g’), which also works for NaNs, with the detail below).
However, this simple fix is complicated by the fact that float⬌double conversions of NaN values always seem to set the quiet bit.
I slightly rewrote your demo program to demonstrate how the quiet bit can be restored after float⬌double conversion in either direction.
#include <float.h>
#include <stdint.h>
#include <stdio.h>
#define F32_QBIT 0x00400000
#define F64_QBIT_OFFSET 29 /* 32 more bits, minus 3 for more exponent */
#define F64_QBIT ((uint64_t)F32_QBIT << F64_QBIT_OFFSET)
uint32_t examples[2] = {0x7fbff000, 0x7ffff000};
int main(void)
{
union {
uint32_t b;
float f;
} f32;
union {
uint64_t b;
double d;
} f64;
uint32_t qbit;
for (int i = 0; i < 2; i++) {
f32.b = examples[i]; /* note quiet bit not set in first example*/
printf("setup f32 with %x == %f\n", f32.b, f32.f);
/* (1) Expand f32 to f64, as needed in unpack('g') */
f64.d = f32.f; /* quiet bit gets set here */
printf("C expands this to %llx == %f\n", f64.b, f64.d);
/* fix up f64 by copying the lost quiet bit from f32.b
Obviously, do this in NaN branch only.
*/
qbit = f32.b & F32_QBIT;
printf("qbit: %x\n", qbit);
f64.b = (f64.b & ~F64_QBIT) | ((uint64_t)qbit << F64_QBIT_OFFSET);
printf("qbit fixed to %llx == %f\n", f64.b, f64.d);
printf("\n");
/* (2) Contract f64 to f32, as needed in pack('g') */
f32.f = (float)f64.d;
printf("convert back to f32: %x == %f\n", f32.b, f32.f);
/* fix up f32 by copying the lost quiet bit from f64.b
Obviously, do this in NaN branch only.
*/
qbit = (f64.b >> F64_QBIT_OFFSET) & F32_QBIT;
printf("qbit: %x\n", qbit);
f32.b = (f32.b & ~F32_QBIT) | qbit;
printf("fixed this to %x == %f\n", f32.b, f32.f);
printf("\n\n");
}
return 0;
}
$ make non-sgl-dbl && ./non-sgl-dbl
...
setup f32 with 7fbff000 == nan
C expands this to 7ffffe0000000000 == nan
qbit: 0
qbit fixed to 7ff7fe0000000000 == nan
convert back to f32: 7ffff000 == nan
qbit: 0
fixed this to 7fbff000 == nan
setup f32 with 7ffff000 == nan
C expands this to 7ffffe0000000000 == nan
qbit: 400000
qbit fixed to 7ffffe0000000000 == nan
convert back to f32: 7ffff000 == nan
qbit: 400000
fixed this to 7ffff000 == nan
Updated by nobu (Nobuyoshi Nakada) 4 months ago
https://github.com/ruby/ruby/pull/11352
I'm uncertain that it is a nice idea to expose such internals of Float
though.
Updated by nobu (Nobuyoshi Nakada) 4 months ago
- Status changed from Open to Feedback
I'm curious about your use case.
For what purpose and how do you want to use it?