## Bug #5227

### Float#round fails on corner cases

**Description**

Float#round fails on some corner cases:

42.0.round(300) # => 42.0

42.0.round(308) # => Infinity, should be 42.0

42.0.round(309) # => 42.0

1.0e307.round(1) # => 1.0e307

1.0e307.round(2) # => Infinity, should be 1.0e307

These occur when the exponent of the intermediate value overflows.

The original code already had criteria for extreme values, but we can find much tighter ones, as explained in the patch below. This fixes the bugs above and optimizes for most trivial cases.

I'd be grateful if someone could look it over before I commit it, thanks.

diff --git a/numeric.c b/numeric.c

index 272bbd1..22608c9 100644

--- a/numeric.c

+++ b/numeric.c

@@ -1491,18 +1491,37 @@ flo_round(int argc, VALUE *argv, VALUE num)

VALUE nd;

double number, f;

int ndigits = 0;

int binexp;

long val;if (argc > 0 && rb_scan_args(argc, argv, "01", &nd) == 1) {

ndigits = NUM2INT(nd);

}

number = RFLOAT_VALUE(num);

## - f = pow(10, abs(ndigits));¶

- if (isinf(f)) {
- if (ndigits < 0) number = 0;
- }
- else {
- frexp (number , &binexp);
+
+/* Let
`exp`

be such that`number`

is written as: "0.#{digits}e#{exp}", - i.e. such that 10 ** (exp - 1) <= |number| < 10 ** exp
- Recall that up to 17 digits can be needed to represent a double,
- so if ndigits + exp >= 17, the intermediate value (number * 10 ** ndigits)
- will be an integer and thus the result is the original number.
- If ndigits + exp <= 0, the result is 0 or "1e#{exp}", so
- if ndigits + exp < 0, the result is 0.
- We have:
- 2 ** (binexp-1) <= |number| < 2 ** binexp
- 10 ** ((binexp-1)/log_2(10)) <= |number| < 10 ** (binexp/log_2(10))
- If binexp >= 0, and since log_2(10) = 3.322259:
- 10 ** (binexp/4 - 1) < |number| < 10 ** (binexp/3)
- binexp/4 <= exp <= binexp/3
- If binexp <= 0, swap the /4 and the /3
- So if ndigits + binexp/(3 or 4) >= 17, the result is number
- If ndigits + binexp/(4 or 3) < 0 the result is 0 +*/
- if ((long)ndigits * (4 - (binexp < 0)) + binexp < 0) {
- number = 0;
- }
- else if ((long)(ndigits - 17) * (3 + (binexp < 0)) + binexp < 0) {
- f = pow(10, abs(ndigits)); if (ndigits < 0) { double absnum = fabs(number); if (absnum < f) return INT2FIX(0);