Bug #6192

Integer() doesn't handle UTF-16 input

Added by John Firebaugh about 2 years ago. Updated about 2 years ago.

[ruby-core:43566]
Status:Closed
Priority:Normal
Assignee:-
Category:-
Target version:-
ruby -v:ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0] Backport:

Description

Integer("2007".encode("UTF-16le"))
ArgumentError: string contains null byte
from (irb):209:in Integer'
from (irb):209
from /Users/john/.rvm/rubies/ruby-1.9.3-p125/bin/irb:16:in
'

Associated revisions

Revision 35120
Added by Nobuyoshi Nakada about 2 years ago

  • bignum.c (rbstrto_inum): must be ASCII compatible encoding as well as String#hex and String#oct. [Bug #6192]
  • string.c (rbmustasciicompat): check if ASCII compatible.

History

#1 Updated by John Firebaugh about 2 years ago

Related, String#to_i:

"2007".encode("UTF-16le").to_i
=> 2

#2 Updated by Eric Hodel about 2 years ago

=begin
I made this patch:

Index: bignum.c
===================================================================
--- bignum.c (revision 35117)
+++ bignum.c (working copy)
@@ -11,6 +11,7 @@

#include "ruby/ruby.h"
#include "ruby/util.h"
+#include "ruby/encoding.h"
#include "internal.h"

#include
@@ -24,6 +25,7 @@
VALUE rb_cBignum;

static VALUE bigthree = Qnil;
+static VALUE sym
replace = Qnil;

#if defined MINGW32
#define USHORT USHORT
@@ -773,8 +775,21 @@ rb
strtoinum(VALUE str, int base, int
long len;
VALUE v = 0;
VALUE ret;
+ VALUE encopts;
+ rb_encoding *enc;

   StringValue(str);

+
+ enc = rbencfromindex(ENCODINGGET((str)));
+
+ if (enc != rbusasciiencoding()) {
+ encopts = rbhashnew();
+ rbhashaset(encopts, symreplace, rbstrnew2(" "));
+ rb
objfreeze(encopts);
+
+ str = rb
strconvencopts(str, enc, rbusasciiencoding(), 0, encopts);
+ }
+
if (badcheck) {
s = StringValueCStr(str);
}
@@ -3809,5 +3824,6 @@ Init
Bignum(void)
powercacheinit();

   big_three = rb_uint2big(3);
  • symreplace = ID2SYM(rbintern("replace"));
    rbgcregistermarkobject(big_three);
    }

    Index: test/ruby/test_literal.rb

    --- test/ruby/testliteral.rb (revision 35117)
    +++ test/ruby/test
    literal.rb (working copy)
    @@ -261,6 +261,23 @@ class TestRubyLiteral < Test::Unit::Test
    }
    end

  • def testintegerencoding

  • bug6192 = '[bug#6192]'
    +

  • s = "2007".encode(Encoding::UTF_16LE)
    +

  • assert_equal(2007, Integer(s), bug6192)
    +

  • s = "3.14 is \xCF\x80"

  • s.forceencoding Encoding::UTF8
    +

  • e = assert_raises(ArgumentError, bug6192) do

  •  Integer(s)
    
  • end
    +

  • assert_equal("Invalid value for Integer(): \"#{s}\"", e.message)

  • end
    +
    def testfloat
    head = ['', '-', '+']
    chars = ['0', '1', '
    ', '9', 'f', '.']

But there is a problem:

1) Failure:

testintegerutf16(TestRubyLiteral) [/Users/drbrain/Work/svn/ruby/trunk/test/ruby/testliteral.rb:278]:
<"Invalid value for Integer(): \"3.14 is π\""> expected but was
<"invalid value for Integer(): \"3.14 is \xCF\x80\"">.

I'm not sure if this output is acceptable or not.
=end

#3 Updated by Nobuyoshi Nakada about 2 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r35120.
John, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • bignum.c (rbstrto_inum): must be ASCII compatible encoding as well as String#hex and String#oct. [Bug #6192]
  • string.c (rbmustasciicompat): check if ASCII compatible.

Also available in: Atom PDF