Project

General

Profile

Actions

Bug #18245

closed

CSV Parser, buffer overflow issue with very specific data

Added by sagii (Hassan Abdul Rehman) 8 months ago. Updated 8 months ago.

Status:
Third Party's Issue
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.6.6p146 (2020-03-31 revision 67876) [x86_64-darwin19]
[ruby-core:105587]

Description

This may not fall into guidelines since it's a very specific issue, but I have exhausted every avenue of this to be a File issue.

Ruby (2.6.6) native CSV parser crashes on a specific file. I have tried reproducing the exact set of bytes that cause the issue, but haven't been able to do so.

What I did then was to replicate the file, but replaced all alphabets with 'a' and numbers with '0'. The resulting file also crashes on a very specific line (1165) claiming my quotes aren't balanced (which they are).

Code that crashes:

CSV.foreach(File.expand_path("~/Downloads/illegal_quoting_case.csv"), skip_lines: /^(?:,\s*)+$/) { |r| puts "\n\n#{r.inspect}\n\n" }

Interesting observations:

if you change any byte (add a character, or remove) from ANY line above 1165, it works fine. Even a space will do, in ANY line above it. You can ADD or REMOVE one character and it works fine.
It works fine if you take away skip_lines
Now I have attempted to debug main codebase, the issue seems to be when the scanner is near the end of buffer chunk size of 8192 then THIS line somehow reads extra bytes, splitting the first column of the next line to cause the issue.

This is a bizzare one to be able to reproduce, but the issue DOES lie somewhere in the CSV::Parser::Scanner::StringScanner's method of reading bytes.


Files

illegal_quoting_case.csv (1.03 MB) illegal_quoting_case.csv sagii (Hassan Abdul Rehman), 10/07/2021 08:22 AM

Updated by jeremyevans0 (Jeremy Evans) 8 months ago

This bug happens when a multibyte row separator (such as \r\n) is split when reading a chunk. I've submitted a pull request to fix the handling of \r\n row separators: https://github.com/ruby/csv/pull/221. It's a suboptimal fix as it doesn't handle other multibyte row separators.

Updated by kou (Kouhei Sutou) 8 months ago

  • Status changed from Open to Third Party's Issue

Could you open an issue on https://github.com/ruby/csv ? We want to track csv gem problem on https://github.com/ruby/csv .

Actions

Also available in: Atom PDF