Project

General

Profile

Actions

Bug #8560

closed

CSV, skip_lines option causes error when passing a string

Added by kstevens715 (Kyle Stevens) almost 11 years ago. Updated over 10 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
[ruby-core:55590]

Description

There seems to be a bug in the CSV class when using the skip_lines option.This option is currently undocumented, but according to the GitHub PR, it accepts any object that responds to match. String responds to match, so one would imagine a string can be used. However, String#match can take either a Regexp or another String. If the argument passed is a string, it will first be converted to a Regexp.

So if you pass a string to the skip_lines option, it will attempt to convert the row in the csv file to a Regexp. This doesn't make sense, and it can also lead to exceptions. For example, if the csv contains a row with the data, "# )", you will get an error, "RegexpError: unmatched close parenthesis: /# )/".

My particular use case when I found this problem was I was trying to ignore lines beginning with a "#".

This was my first, unsuccessful attempt:
csv = CSV.open(FILE_NAME, skip_lines: "#", encoding: "ISO8859-1")

What I ended up having to do was:
csv = CSV.open(FILE_NAME, skip_lines: /\A#/, encoding: "ISO8859-1")

This isn't a huge problem, since there's a perfectly acceptable work-around. However, it would be a very easy mistake to make and it could be a difficult problem for someone to debug. It could lead to quite strange behavior if each row in the csv is converted to a Regexp.

I think the skip_lines option should be converted to a Regexp if it's a string, because the alternative is the CSV row is going to be converted to a Regexp.

My proposal is if a string is passed to skip_lines, it should be converted to a regular expression to match the beginning of a line, excluding whitespace:
"#" => /\A\s+#/

I'd be willing to work on a pull request to implement a fix, but I'd love to hear some feedback first. I definitely think this should be fixed, but I'm not positive my solution is the best option...

Here is the original pull request that implemented this option:
https://github.com/ruby/ruby/pull/161

Thank you


Files

csv-string-skip-lines.patch (3.96 KB) csv-string-skip-lines.patch Patch that fixes the problem kstevens715 (Kyle Stevens), 11/24/2013 04:08 AM

Updated by Anonymous almost 11 years ago

  • Assignee set to JEG2 (James Gray)

Updated by zzak (zzak _) almost 11 years ago

@Kyle you might want to try to ping JEG2 on that ticket as well. He
may not see this.

Updated by kstevens715 (Kyle Stevens) over 10 years ago

Attached is a patch that converts skip_lines to a Regexp if it's a string. Also see pull request on GitHub: https://github.com/ruby/ruby/pull/455

Updated by JEG2 (James Gray) over 10 years ago

  • Category set to lib
  • Status changed from Open to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0