Bug #18245
closedCSV Parser, buffer overflow issue with very specific data
Description
This may not fall into guidelines since it's a very specific issue, but I have exhausted every avenue of this to be a File issue.
Ruby (2.6.6) native CSV parser crashes on a specific file. I have tried reproducing the exact set of bytes that cause the issue, but haven't been able to do so.
What I did then was to replicate the file, but replaced all alphabets with 'a' and numbers with '0'. The resulting file also crashes on a very specific line (1165) claiming my quotes aren't balanced (which they are).
Code that crashes:
CSV.foreach(File.expand_path("~/Downloads/illegal_quoting_case.csv"), skip_lines: /^(?:,\s*)+$/) { |r| puts "\n\n#{r.inspect}\n\n" }
Interesting observations:
if you change any byte (add a character, or remove) from ANY line above 1165, it works fine. Even a space will do, in ANY line above it. You can ADD or REMOVE one character and it works fine.
It works fine if you take away skip_lines
Now I have attempted to debug main codebase, the issue seems to be when the scanner is near the end of buffer chunk size of 8192 then THIS line somehow reads extra bytes, splitting the first column of the next line to cause the issue.
This is a bizzare one to be able to reproduce, but the issue DOES lie somewhere in the CSV::Parser::Scanner::StringScanner
's method of reading bytes.
Files
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
This bug happens when a multibyte row separator (such as \r\n
) is split when reading a chunk. I've submitted a pull request to fix the handling of \r\n
row separators: https://github.com/ruby/csv/pull/221. It's a suboptimal fix as it doesn't handle other multibyte row separators.
Updated by kou (Kouhei Sutou) about 3 years ago
- Status changed from Open to Third Party's Issue
Could you open an issue on https://github.com/ruby/csv ? We want to track csv gem problem on https://github.com/ruby/csv .