https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112009-08-23T05:44:25ZRuby Issue Tracking SystemRuby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=53742009-08-23T05:44:25ZJEG2 (James Gray)jeg2@ruby-lang.org
<ul><li><strong>Due date</strong> set to <i>12/31/2009</i></li><li><strong>Target version</strong> set to <i>2.0.0</i></li></ul><p>=begin<br>
I just wanted to state that I am aware of this patch and do plan to apply something like it. I'm purposefully waiting a bit to see how the new FasterCSV release, which uses a similar non-regex approach, fairs in the wild. When I fill confident we've made the right choice to switch parsing strategies, I will commit some version of this patch.<br>
=end</p> Ruby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=91282010-03-20T00:32:06ZJEG2 (James Gray)jeg2@ruby-lang.org
<ul></ul><p>=begin<br>
Timothy, is it possible for you to redo this patch against the current trunk, so I can look at what it would take to get it applied?</p>
<p>You mention that it is faster which is great. Are you also pretty confident that it maintains CSV's m17n savvy implementation? I don't want to lose the ability to parse in any encoding. I have some tests for this and I'll try to look over the code as we change it, but I know I'm not perfect and could have missed something.<br>
=end</p> Ruby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=91392010-03-20T03:36:49Zender672 (Timothy Elliott)tle@holymonkey.com
<ul><li><strong>File</strong> <a href="/attachments/898">ruby_19_csv_speedup_02.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/898/ruby_19_csv_speedup_02.patch">ruby_19_csv_speedup_02.patch</a> added</li><li><strong>File</strong> <a href="/attachments/899">csv.rb</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/899/csv.rb">csv.rb</a> added</li></ul><p>=begin<br>
I updated the patch to against trunk. Attaching to this ticket update.</p>
<p>I took care to make it pass all tests, including the m17n tests in test/csv/test_encodings.rb . I have not tested it against m17n CSV files outside of the tests. It would be a good idea to test against a few large m17n CSV files to make sure that the new operations used by this parser (String.gsub!, String#count & String#last) perform well in that scenario.</p>
<p>=end</p> Ruby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=91862010-03-22T04:23:57ZJEG2 (James Gray)jeg2@ruby-lang.org
<ul></ul><p>=begin<br>
Thanks Timothy, but this patch doesn't include your latest fix from the FasterCSV side, to restore the strict parser behavior. This tests is currently failing:</p>
<p>def test_non_regex_edge_cases<br>
# An early version of the non-regex parser fails this test<br>
[ [ "foo,"foo,bar,baz,foo","foo"",<br>
["foo", "foo,bar,baz,foo", "foo"] ] ].each do |edge_case|<br>
assert_equal(edge_case.last, CSV.parse_line(edge_case.first))<br>
end</p>
<pre><code> assert_raise(CSV::MalformedCSVError) do
CSV.parse_line("1,\"23\"4\"5\", 6")
end
</code></pre>
<p>end</p>
<p>Can I bother you for one more patch that includes this fix as well? Thanks!</p>
<p>(I'm applying all of the new FasterCSV fixes and am now missing only this in my local checkout, so a patch adding this after the latest patch you gave me would be best. You also don't need to add the test, since I already have.)</p>
<p>James Edward Gray II</p>
<p>P.S. It is quite a bit faster and does seem to play well with m17n, from what I can see. Great work!</p>
<p>=end</p> Ruby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=92252010-03-23T16:16:35Zender672 (Timothy Elliott)tle@holymonkey.com
<ul><li><strong>File</strong> <a href="/attachments/904">check_for_stray_quotes_19.patch</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/904/check_for_stray_quotes_19.patch">check_for_stray_quotes_19.patch</a> added</li></ul><p>=begin<br>
Patch attached.</p>
<p>Thanks,<br>
Tim Elliott<br>
=end</p> Ruby master - Feature #1981: [PATCH] CSV Parsing Speeduphttps://bugs.ruby-lang.org/issues/1981?journal_id=92312010-03-24T00:03:02ZJEG2 (James Gray)jeg2@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>=begin<br>
Thanks for all the help Tim. We're all caught up now with all of your patches applied.<br>
=end</p>