Project

General

Profile

Actions

Bug #4069

closed

String#parse_csv fails to parse "\r" character embedded string

Added by phasis68 (Heesob Park) over 13 years ago. Updated almost 13 years ago.

Status:
Rejected
Target version:
ruby -v:
ruby 1.9.3dev (2010-11-18 trunk 29823) [i386-mswin32_90]
Backport:
[ruby-core:33247]

Description

=begin
C:\work>ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv'
ruby 1.9.3dev (2010-11-18 trunk 29823) [i386-mswin32_90]
c:/usr/lib/ruby/1.9.1/csv.rb:1914:in block in shift': Unclosed quoted field on line 1. (CSV::MalformedCSVError) from c:/usr/lib/ruby/1.9.1/csv.rb:1831:in loop'
from c:/usr/lib/ruby/1.9.1/csv.rb:1831:in shift' from c:/usr/lib/ruby/1.9.1/csv.rb:1390:in parse_line'
from c:/usr/lib/ruby/1.9.1/csv.rb:2341:in parse_csv' from -e:1:in '
=end

Actions #1

Updated by ender672 (Timothy Elliott) over 13 years ago

=begin
["aa\rbb"].to_csv results in the string ""aa\rbb"\n"

When you don't specify a row separator the ruby CSV library makes a guess by searching for the first occurrence of \r or \n.

In the case of ""aa\rbb"\n" it encounters the \r and assumes that it is your row separator. In order to point it to the correct row separator, you have to supply the option :row_sep => "\n" :

$ ruby -rcsv -ve 'p ["aa\rbb"].to_csv.parse_csv(:row_sep => "\n")'
ruby 1.9.3dev (2010-11-19 trunk 29830) [x86_64-linux]
["aa\rbb"]

=end

Actions #2

Updated by JEG2 (James Gray) over 13 years ago

  • Status changed from Open to Rejected
  • Assignee set to JEG2 (James Gray)

=begin
Sorry, not sure how I missed this ticket. As Timothy says, this is intended documented behavior:

 # <b><tt>:row_sep</tt></b>::            The String appended to the end of each
 #                                       row.  This can be set to the special
 #                                       <tt>:auto</tt> setting, which requests
 #                                       that CSV automatically discover this
 #                                       from the data.  Auto-discovery reads
 #                                       ahead in the data looking for the next
 #                                       <tt>"\r\n"</tt>, <tt>"\n"</tt>, or
 #                                       <tt>"\r"</tt> sequence.  A sequence
 #                                       will be selected even if it occurs in
 #                                       a quoted field, assuming that you
 #                                       would have the same line endings
 #                                       there.  If none of those sequences is
 #                                       found, +data+ is <tt>ARGF</tt>,
 #                                       <tt>STDIN</tt>, <tt>STDOUT</tt>, or
 #                                       <tt>STDERR</tt>, or the stream is only
 #                                       available for output, the default
 #                                       <tt>$INPUT_RECORD_SEPARATOR</tt>
 #                                       (<tt>$/</tt>) is used.  Obviously,
 #                                       discovery takes a little time.  Set
 #                                       manually if speed is important.  Also
 #                                       note that IO objects should be opened
 #                                       in binary mode on Windows if this
 #                                       feature will be used as the
 #                                       line-ending translation can cause
 #                                       problems with resetting the document
 #                                       position to where it was before the
 #                                       read ahead. This String will be
 #                                       transcoded into the data's Encoding
 #                                       before parsing.

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0