Project

General

Profile

Actions

Bug #18245

closed

CSV Parser, buffer overflow issue with very specific data

Added by sagii (Hassan Abdul Rehman) over 2 years ago. Updated over 2 years ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-darwin19]
[ruby-core:105587]

Description

This may not fall into guidelines since it's a very specific issue, but I have exhausted every avenue of this to be a File issue.

Ruby (2.6.6) native CSV parser crashes on a specific file. I have tried reproducing the exact set of bytes that cause the issue, but haven't been able to do so.

What I did then was to replicate the file, but replaced all alphabets with 'a' and numbers with '0'. The resulting file also crashes on a very specific line (1165) claiming my quotes aren't balanced (which they are).

Code that crashes:

CSV.foreach(File.expand_path("~/Downloads/illegal_quoting_case.csv"), skip_lines: /^(?:,\s*)+$/) { |r| puts "\n\n#{r.inspect}\n\n" }

Interesting observations:

if you change any byte (add a character, or remove) from ANY line above 1165, it works fine. Even a space will do, in ANY line above it. You can ADD or REMOVE one character and it works fine.
It works fine if you take away skip_lines
Now I have attempted to debug main codebase, the issue seems to be when the scanner is near the end of buffer chunk size of 8192 then THIS line somehow reads extra bytes, splitting the first column of the next line to cause the issue.

This is a bizzare one to be able to reproduce, but the issue DOES lie somewhere in the CSV::Parser::Scanner::StringScanner's method of reading bytes.


Files

illegal_quoting_case.csv (1.03 MB) illegal_quoting_case.csv sagii (Hassan Abdul Rehman), 10/07/2021 08:22 AM
Actions

Also available in: Atom PDF

Like0
Like0Like0