Feature #1981: [PATCH] CSV Parsing Speedup - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #1981

closed

[PATCH] CSV Parsing Speedup

Feature #1981: [PATCH] CSV Parsing Speedup

Added by ender672 (Timothy Elliott) almost 17 years ago. Updated about 15 years ago.

Status:

Closed

Assignee:

JEG2 (James Gray)

Target version:

2.0.0

[ruby-core:25038]

Description

=begin
This patch replaces the regex used in the Ruby 1.9 CSV parser with ruby code.

Running all CSV tests (ts_all.rb) is much faster (36% on my machine). Probably because they don't have to rebuild the regex over and over again. I tested on large CSV files and got an average speedup of 23% on my machine.

(James, this patch is improved & faster from the one I emailed to you on 8/21. Putting into this ticket to make it easier to track any changes.)
=end

Files

Download all files

ruby_19_csv_speedup_01.patch (7.71 KB) ruby_19_csv_speedup_01.patch	Patch against current svn (24629)	ender672 (Timothy Elliott), 08/23/2009 04:44 AM
csv.rb (80.6 KB) csv.rb	Patched csv.rb	ender672 (Timothy Elliott), 08/23/2009 04:44 AM
ruby_19_csv_speedup_02.patch (7.71 KB) ruby_19_csv_speedup_02.patch	Same patch as before, against r26974	ender672 (Timothy Elliott), 03/20/2010 03:36 AM
csv.rb (80.8 KB) csv.rb	Patched lib/csv.rb, r26974	ender672 (Timothy Elliott), 03/20/2010 03:36 AM
check_for_stray_quotes_19.patch (1.1 KB) check_for_stray_quotes_19.patch		ender672 (Timothy Elliott), 03/23/2010 04:16 PM

Updated by JEG2 (James Gray) almost 17 years ago Actions
Copy link
#1

Due date set to 12/31/2009
Target version set to 2.0.0

=begin
I just wanted to state that I am aware of this patch and do plan to apply something like it. I'm purposefully waiting a bit to see how the new FasterCSV release, which uses a similar non-regex approach, fairs in the wild. When I fill confident we've made the right choice to switch parsing strategies, I will commit some version of this patch.
=end

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#2

=begin
Timothy, is it possible for you to redo this patch against the current trunk, so I can look at what it would take to get it applied?

You mention that it is faster which is great. Are you also pretty confident that it maintains CSV's m17n savvy implementation? I don't want to lose the ability to parse in any encoding. I have some tests for this and I'll try to look over the code as we change it, but I know I'm not perfect and could have missed something.
=end

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link Download all files
#3

File ruby_19_csv_speedup_02.patch ruby_19_csv_speedup_02.patch added
File csv.rb csv.rb added

=begin
I updated the patch to against trunk. Attaching to this ticket update.

I took care to make it pass all tests, including the m17n tests in test/csv/test_encodings.rb . I have not tested it against m17n CSV files outside of the tests. It would be a good idea to test against a few large m17n CSV files to make sure that the new operations used by this parser (String.gsub!, String#count & String#last) perform well in that scenario.

=end

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#4

=begin
Thanks Timothy, but this patch doesn't include your latest fix from the FasterCSV side, to restore the strict parser behavior. This tests is currently failing:

def test_non_regex_edge_cases
# An early version of the non-regex parser fails this test
[ [ "foo,"foo,bar,baz,foo","foo"",
["foo", "foo,bar,baz,foo", "foo"] ] ].each do |edge_case|
assert_equal(edge_case.last, CSV.parse_line(edge_case.first))
end

 assert_raise(CSV::MalformedCSVError) do
   CSV.parse_line("1,\"23\"4\"5\", 6")
 end

end

Can I bother you for one more patch that includes this fix as well? Thanks!

(I'm applying all of the new FasterCSV fixes and am now missing only this in my local checkout, so a patch adding this after the latest patch you gave me would be best. You also don't need to add the test, since I already have.)

James Edward Gray II

P.S. It is quite a bit faster and does seem to play well with m17n, from what I can see. Great work!

=end

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link
#5

File check_for_stray_quotes_19.patch check_for_stray_quotes_19.patch added

=begin
Patch attached.

Thanks,
Tim Elliott
=end

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#6

Status changed from Open to Closed

=begin
Thanks for all the help Tim. We're all caught up now with all of your patches applied.
=end

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #1981

[PATCH] CSV Parsing Speedup

Updated by JEG2 (James Gray) almost 17 years ago Actions
Copy link
#1

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#2

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link Download all files
#3

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#4

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link
#5

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#6

Project

General

Profile

Ruby

Custom queries

Feature #1981

[PATCH] CSV Parsing Speedup

Updated by JEG2 (James Gray) almost 17 years ago ActionsCopy link #1

Updated by JEG2 (James Gray) over 16 years ago ActionsCopy link #2

Updated by ender672 (Timothy Elliott) over 16 years ago ActionsCopy link Download all files #3

Updated by JEG2 (James Gray) over 16 years ago ActionsCopy link #4

Updated by ender672 (Timothy Elliott) over 16 years ago ActionsCopy link #5

Updated by JEG2 (James Gray) over 16 years ago ActionsCopy link #6

Updated by JEG2 (James Gray) almost 17 years ago Actions
Copy link
#1

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#2

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link Download all files
#3

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#4

Updated by ender672 (Timothy Elliott) over 16 years ago Actions
Copy link
#5

Updated by JEG2 (James Gray) over 16 years ago Actions
Copy link
#6