Project

General

Profile

Actions

Bug #4603

closed

lib/csv.rb: when the :encoding parameter is not provided, the encoding of CSV data is treated as ASCII-8BIT

Added by nobuoka (yu nobuoka) almost 13 years ago. Updated almost 12 years ago.

Status:
Closed
Target version:
ruby -v:
-
Backport:
[ruby-core:35866]

Description

=begin
This issue is involved in three methods, CSV::open, CSV::read and CSV::foreach.

The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.

I think the actual behaviour of these methods doesn't conform to the document of these.
=end


Files

csv_test.rb (814 Bytes) csv_test.rb nobuoka (yu nobuoka), 04/24/2011 03:33 PM

Updated by naruse (Yui NARUSE) almost 13 years ago

  • Status changed from Open to Assigned
  • Assignee set to JEG2 (James Gray)
  • Target version set to 1.9.2

=begin

=end

Updated by naruse (Yui NARUSE) almost 13 years ago

  • ruby -v changed from ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux] to -

=begin
2011/4/25 James Gray :

On Sun, Apr 24, 2011 at 1:33 AM, yu nobuoka
wrote:

The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the
CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.

I think the actual behaviour of these methods doesn't conform to the
document of these.

It seems this was an intentional change not made by me:
r25362 | naruse | 2009-10-15 22:04:38 -0500 (Thu, 15 Oct 2009) | 2 lines

  • lib/csv.rb (CSV#raw_encoding): returns ASCII-8BIT when the io doesn't have encoding.
    This seems like a wrong choice. Why would we not want to support the
    default encodings? Can someone please explain to me why this was done?

Ah, sorry, that commit message doesn't explain the intention.
It is for IO-like object which doesn't have encoding method, for example Zlib::GzipReader
test_gzip_reader_bug_fix in test/csv/test_features.rb.

Anyway, even if I applied following patch, the problem is still reproduced.

diff --git a/lib/csv.rb b/lib/csv.rb
index 45273f9..ee35ccc 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -2296,7 +2296,7 @@ class CSV
elsif @io.respond_to? :encoding
@io.encoding
else

  •  default
    
  •  Encoding.default_internal || Encoding.default_external
    
    end
    end
    end

--
NARUSE, Yui 
=end

Updated by naruse (Yui NARUSE) almost 13 years ago

  • ruby -v changed from - to ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux]

=begin

=end

Updated by Anonymous almost 13 years ago

=begin
On Sun, Apr 24, 2011 at 1:33 AM, yu nobuoka wrote:

The document of CSV::read says "This method also understands an additional
:encoding parameter that you can use to specify the Encoding of the data
in the file to be read. You must provide this unless your data is in
Encoding::default_external()."
However, when the :encoding parameter is not provided, the encoding of the
CSV data
is treated as ASCII-8BIT. Not as Encoding.default_external.
CSV::open and CSV::foreach are also similar.

I think the actual behaviour of these methods doesn't conform to the
document of these.

It seems this was an intentional change not made by me:

r25362 | naruse | 2009-10-15 22:04:38 -0500 (Thu, 15 Oct 2009) | 2 lines

  • lib/csv.rb (CSV#raw_encoding): returns ASCII-8BIT when the io
    doesn't have encoding.

This seems like a wrong choice. Why would we not want to support the
default encodings? Can someone please explain to me why this was done?

James Edward Gray II
=end

Updated by Anonymous almost 13 years ago

=begin
On Sun, Apr 24, 2011 at 11:29 PM, NARUSE, Yui wrote:

Ah, sorry, that commit message doesn't explain the intention.
It is for IO-like object which doesn't have encoding method, for
example Zlib::GzipReader
test_gzip_reader_bug_fix in test/csv/test_features.rb.

Anyway, even if I applied following patch, the problem is still reproduced.

diff --git a/lib/csv.rb b/lib/csv.rb
index 45273f9..ee35ccc 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -2296,7 +2296,7 @@ class CSV
elsif @io.respond_to? :encoding
@io.encoding
else

  •  default
    
  •  Encoding.default_internal || Encoding.default_external
    
    end
    end
    end

OK, I see the problem.

The issue is that a mode of "rb" is being used to suppress newline
translation on Windows. However, that's also switching my Encoding to
ASCII-8BIT. Usually I love that feature, but here it's not what I want. Is
there anyway to shut off the translation and not get the encoding change?

James Edward Gray II
=end

Updated by nobu (Nobuyoshi Nakada) almost 13 years ago

  • ruby -v changed from ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-linux] to -

=begin
Hi,

At Mon, 25 Apr 2011 22:57:52 +0900,
James Gray wrote in [ruby-core:35878]:

The issue is that a mode of "rb" is being used to suppress newline
translation on Windows. However, that's also switching my Encoding to
ASCII-8BIT. Usually I love that feature, but here it's not what I want. Is
there anyway to shut off the translation and not get the encoding change?

Now fixed so that universal_newline: false can work. What's
about the following patch?


diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end

  • default to a binary open mode

  • args << "rb" if args.size == 1 and !options.key?(:mode)
  • wrap a File opened with the remaining +args+

  • csv = new(File.open(*args, options), options)
  • wrap a File opened with the remaining +args+ with no newline

  • decorator

  • file_opts = {universal_newline: false}.merge(options)

  • begin

  •  f = File.open(*args, file_opts)
    
  • rescue ArgumentError => e

  •  throw unless /needs binmode/ =~ e.message and args.size == 1
    
  •  args << "rb"
    
  •  file_opts = {encoding: Encoding.default_external}.merge(file_opts)
    
  •  retry
    
  • end

  • csv = new(f, options)

    handle blocks like Ruby's open(), not like the CSV library

    if block_given?
    @@ -1398,11 +1406,8 @@ class CSV
    # encoding: "UTF-32BE:UTF-8" would read UTF-32BE data from the file
    # but transcode it to UTF-8 before CSV parses it.
    #

  • def self.read(path, options = Hash.new)
  • encoding = options.delete(:encoding)
  • mode = "rb"
  • mode << ":#{encoding}" if encoding
  • open(path, mode, options) { |csv| csv.read }
  • def self.read(path, *options)
  • open(path, *options) { |csv| csv.read }
    end
# Alias for CSV::read().

diff --git i/test/csv/test_encodings.rb w/test/csv/test_encodings.rb
index 3880f3a..54c34f3 100755
--- i/test/csv/test_encodings.rb
+++ w/test/csv/test_encodings.rb
@@ -79,6 +79,21 @@ class TestCSV::Encodings < TestCSV
end
end

  • def test_read_with_default_encoding
  • data = "abc"
  • default_external = Encoding.default_external
  • each_encoding do |encoding|
  •  File.open(@temp_csv_path, "wb", encoding: encoding) {|f| f << data}
    
  •  begin
    
  •    Encoding.default_external = encoding
    
  •    result = CSV.read(@temp_csv_path)[0][0]
    
  •  ensure
    
  •    Encoding.default_external = default_external
    
  •  end
    
  •  assert_equal(encoding, result.encoding)
    
  • end
  • end
  • #######################################################################

    Stress Test ASCII Compatible and Non-ASCII Compatible Encodings

    #######################################################################

--
Nobu Nakada
=end

Actions #7

Updated by nobuoka (yu nobuoka) almost 13 years ago

=begin
Hi,

Nobuyoshi Nakada wrote:

Now fixed so that universal_newline: false can work. What's
about the following patch?

diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end

  • default to a binary open mode

  • args << "rb" if args.size == 1 and !options.key?(:mode)
  • wrap a File opened with the remaining +args+

  • csv = new(File.open(*args, options), options)
  • wrap a File opened with the remaining +args+ with no newline

  • decorator

  • file_opts = {universal_newline: false}.merge(options)
  • begin
  •  f = File.open(*args, file_opts)
    
  • rescue ArgumentError => e
  •  throw unless /needs binmode/ =~ e.message and args.size == 1
    
  •  args << "rb"
    
  •  file_opts = {encoding: Encoding.default_external}.merge(file_opts)
    
  •  retry
    
  • end
  • csv = new(f, options)

In +rescue+ clause, a +throw+ expression is used. Is it correct?
I think a +raise+ expression should be used instead. Or my idea is wrong...?
=end

Updated by Anonymous almost 13 years ago

=begin
On Tue, Apr 26, 2011 at 3:13 PM, yu nobuoka wrote:

Issue #4603 has been updated by yu nobuoka.

Hi,

Nobuyoshi Nakada wrote:

Now fixed so that universal_newline: false can work. What's
about the following patch?

diff --git i/lib/csv.rb w/lib/csv.rb
index 1aad2f3..d51cb55 100644
--- i/lib/csv.rb
+++ w/lib/csv.rb
@@ -1334,10 +1334,18 @@ class CSV
def self.open(*args)
# find the +options+ Hash
options = if args.last.is_a? Hash then args.pop else Hash.new end

  • default to a binary open mode

  • args << "rb" if args.size == 1 and !options.key?(:mode)
  • wrap a File opened with the remaining +args+

  • csv = new(File.open(*args, options), options)
  • wrap a File opened with the remaining +args+ with no newline

  • decorator

  • file_opts = {universal_newline: false}.merge(options)
  • begin
  •  f = File.open(*args, file_opts)
    
  • rescue ArgumentError => e
  •  throw unless /needs binmode/ =~ e.message and args.size == 1
    
  •  args << "rb"
    
  •  file_opts = {encoding:
    

Encoding.default_external}.merge(file_opts)

  •  retry
    
  • end
  • csv = new(f, options)

In +rescue+ clause, a +throw+ expression is used. Is it correct?
I think a +raise+ expression should be used instead. Or my idea is
wrong...?

Good point. I think raise() is correct.

James Edward Gray II
=end

Actions #9

Updated by nobu (Nobuyoshi Nakada) almost 13 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r31370.
yu, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • lib/csv.rb (CSV::open): suppress universal newline decorator.
    fixes #4603
    =end

Updated by theirishpenguin (Declan McGrath) almost 12 years ago

Hi,

I seem to be still getting this problem on Ruby 1.9.2p290 revision 32553. This issue should be fixed in revision 32553, correct? (as 32553 > 31370)

See below for test case.

Regards,
Declan

Test case

  1. Contents of test.csv are:
    á,b
    1,2

  2. Steps to reproduce issue:
    declan@foo:~$ ruby -v
    ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]
    declan@foo:~$ irb
    irb(main):004:0> Encoding::default_internal = 'UTF-8'
    => "UTF-8"
    irb(main):007:0> Encoding::default_external = 'UTF-8'
    => "UTF-8"
    irb(main):009:0> require 'csv'
    => true
    irb(main):010:0> CSV.read('test.csv')
    => [["\xC3\xA1", "b"], ["1", "2"]]

Updated by naruse (Yui NARUSE) almost 12 years ago

  • Description updated (diff)

theirishpenguin (Declan McGrath) wrote:

I seem to be still getting this problem on Ruby 1.9.2p290 revision 32553.
This issue should be fixed in revision 32553, correct? (as 32553 > 31370)

No, revision numbers are repository global number.
r32553 > r31370 doesn't mean it because they are different branch.
r32553 changes trunk and ruby_1_9_3 branch but doesn't change ruby_1_9_2 branch.
r31370 is not merged to ruby_1_9_2.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0