argf_read
pseudocodized:
/* 1 */ argf_read(VALUE length)
/* 2 */ {
/* 3 */ long len = NUM2LONG(length);
/* 4 */ VALUE str = StringValue();
/* 5 */ retry:
/* 6 */ VALUE tmp = io_read(length, ARGF.current_file);
/* 7 */ if (tmp != NIL_P)
/* 8 */ rb_str_append(str, tmp);
/* 9 */ if (tmp == NIL_P) {
/* 10 */ ARGF.close();
/* 11 */ ARGF.next();
/* 12 */ goto retry;
/* 13 */ } else {
/* 14 */ slen = RSTRING_LEN(str);
/* 15 */ if (slen < len) {
/* 16 */ len -= slen;
/* 17 */ length = LONG2NUM(len);
/* 18 */ goto retry;
/* 19 */ }
/* 20 */ }
/* 21 */ return str;
/* 22 */ }
The bug resides in line 16, when len
is reduced.
The joint effect of lines 16-17 that both len
and length
are reduced by the length of the string being assembled (str
). Here len
is a C long that represents the required total read length (according to the role it fulfils in the conditional guard expression of line 15), and length
is a Ruby numeric value representing the remaining length (after the read attempts, that occur in line 6, that took place up to that point). So modifying len
is incorrect.
Why do we need three files for the bug to manifest?
- The bug is triggered in the conditional guard of line 15, with certain corrupt values of
len
.
- That requires two passings through line 15, as initial
len
is not yet corrupt, it only becomes such in the next line (line 16).
Assuming ARGC
contains three files, and argf_read
is invoked with a length
that exceeds the combined length of the first two files, the control flow goes as follows:
- The first I/O activity is reading in the first file, upon hitting line 6. From this the code flow passes on to line 15, where the condition shall be fulfilled, as we read less than the required length, so we get to line 16, where the corruption occurs.
- We are still at the first file, I/O is attempted on it again, but as we reached its end already,
io_read()
returns nil (stored in tmp
).
- A nil
tmp
lands us in the block of lines 9-13, where we step to the second file.
- This also means avoiding the critical line 15.
- We move on to do I/O on the second file, reading it in, too.
- As we still have data to read (residing in the third file), we should pass into the code block between lines 15-19 from where we would go to doing I/O again.
- However, the conditional guard of line 15, that could give us pass into this code block, can bogusly fail due to the corrupt, incorrectly reduced
len
.
The exact condition for length
to trigger the bug: if the three files are of size s1, s2, s3, then, as said before:
-
s1, s2, s3 > 0
-
s1 + s2 <
length
and also:
Checking these criteria is left to the reader.