Bug #14729
closedFloat("long_invalid_string") fails to throw an exception
Description
When Float() is used to convert a string into a float, invalid characters in the string throw an error.
But when a really long string is passed to Float(), invalid characters exceeding the size of the internal C buffer are ignored and no error is thrown.
This behavior is inconsistent; underscores are verified throughout the entire string so why not look for other invalid characters?
I have a weak patch but would prefer to see what the developers think of this bug before I post it. Should Float() accept any size string or limit it?
Code details:
The code in question is object.c:rb_cstr_to_dbl_raise().
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/object.c#L3232
Specifically the buffer limit is usually 70-1 digits. For reference, 2^64 is 20 digits so this may be a academic exercise.
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/object.c#L3271
As an aside, I believe the last check on errno in the function is unnecessary. Errno should be examined immediately after a system call, which it is, so it's unclear why it's checked again at the end of the function.
https://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/object.c#L3307
The following code demonstrates the issue with some additional comments.
#!/usr/bin/env ruby
require 'test/unit'
require 'test/unit/assertions'
include Test::Unit::Assertions
class TestFloat < Test::Unit::TestCase
# https://bugs.ruby-lang.org/projects/ruby-trunk/repository/entry/object.c#L3271
# BUF_SIZE = 69 on most machines
# -1 is for newline
# Bonus points if you can explain the constants 4 and 10?
BUF_SIZE = Float::DIG * 4 + 10 - 1
# case 1: invalid char 'a' is within buffer size
# Result: strtod correctly throws error
def test_strtod_ok
assert_raise(ArgumentError){Float('1' * (BUF_SIZE-1) + 'a')}
end
# case 2: invalid char 'a' is outside buffer size
# Result: strtod doesn't throw error because buffer doesn't contain invalid char.
# Confusing why ruby's behavior is different between case 1 and 2 until you look at C code.
def test_strtod_no_error
assert_equal(1.1111111111111112e+68, Float('1' * BUF_SIZE + 'a is ignored'))
end
# case 3: entire string is scanned for underscores
# Result: when '_' is found in string, prev char is checked and MUST be ISDIGIT
# or error is thrown by rb_cstr_to_dbl_raise not strtod.
def test_underscores_checked_whole_string
assert_raise(ArgumentError){Float('1' * BUF_SIZE + '234_56a_890')}
end
# case 4: the bug - should ruby scan entire string and detect invalid chars
# just like it does for invalid underscores so this test should pass?
# Result: no exception raised (currently)
def test_check_whole_string_for_invalid_chars
assert_raise(ArgumentError){Float('1' * BUF_SIZE + 'a')}
end
end