Project

General

Profile

Bug #8940

printing UTF-32 crashs ruby

Added by Hanmac (Hans Mackowiak) over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.0dev (2013-09-23) [x86_64-darwin12.5.0]
[ruby-core:57318]

Description

using

p "äöü".encode("UTF-32")

does cause a SEGFAULT

-- C level backtrace information -------------------------------------------
0 libruby.2.1.0.dylib 0x00000001023f6679 rb_vm_bugreport + 137
1 libruby.2.1.0.dylib 0x00000001022bab1b report_bug + 283
2 libruby.2.1.0.dylib 0x00000001022ba9f4 rb_bug + 180
3 libruby.2.1.0.dylib 0x000000010237cc80 sigsegv + 144
4 libsystem_c.dylib 0x00007fff91d7d90a _sigtramp + 26
5 ??? 0x0000000000000000 0x0 + 0
6 libruby.2.1.0.dylib 0x00000001022b0045 rb_enc_precise_mbclen + 21
7 libruby.2.1.0.dylib 0x0000000102391cc8 rb_str_inspect + 968
8 libruby.2.1.0.dylib 0x00000001023f1e74 vm_call0_body + 2116
9 libruby.2.1.0.dylib 0x00000001023f1264 rb_call0 + 404
10 libruby.2.1.0.dylib 0x00000001023e7f15 rb_funcall + 261
11 libruby.2.1.0.dylib 0x0000000102312777 rb_inspect + 23
12 libruby.2.1.0.dylib 0x00000001022e663b rb_p + 11
13 libruby.2.1.0.dylib 0x00000001022f5b29 rb_f_p_internal + 57
14 libruby.2.1.0.dylib 0x00000001022c0b56 rb_ensure + 118
15 libruby.2.1.0.dylib 0x00000001022e9c9f rb_f_p + 31
16 libruby.2.1.0.dylib 0x00000001023f4baf vm_call_cfunc + 1007
17 libruby.2.1.0.dylib 0x00000001023f4528 vm_call_method + 840
18 libruby.2.1.0.dylib 0x00000001023deca7 vm_exec_core + 11591
19 libruby.2.1.0.dylib 0x00000001023eb4cd vm_exec + 109
20 libruby.2.1.0.dylib 0x00000001023ec2d8 rb_iseq_eval_main + 392
21 libruby.2.1.0.dylib 0x00000001022bfd69 ruby_exec_internal + 121
22 libruby.2.1.0.dylib 0x00000001022bfcae ruby_run_node + 78
23 ruby 0x0000000102274eef main + 79


Related issues

Related to Ruby master - Bug #9415: Strings#codepoints doesn't respect BOM on UTF-{16,32} pseudo encodingsClosed01/15/2014naruse (Yui NARUSE)Actions

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

It'd be related to that UTF-32 is a pseudo encoding, probably.

Updated by Hanmac (Hans Mackowiak) over 6 years ago

hm it maybe is ...
funny thing:

this works:
"äöü".encode("UTF-32BE") #=> "\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32") #=> "\uFEFF\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32LE") #=> "\u00E4\u00F6\u00FC" # << imo this should be wrong, or isnt there a difference between BE and LE ?
this not:
"äöü".encode("UTF-32LE") #=> "\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32") #crash

PS: it also happens for UTF-16

#3

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r43023.
Hans, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


encdb.c, utf_16_32.h: Unicode with BOM

  • enc/encdb.c, enc/utf_16_32.h (ENC_DUMMY_UNICODE): Unicode with BOM must be based on big endian variants, so that actual encodings would work. [ruby-core:57318] [Bug #8940]

Updated by nobu (Nobuyoshi Nakada) over 6 years ago

  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED

Updated by naruse (Yui NARUSE) over 6 years ago

  • Status changed from Closed to Assigned
  • Priority changed from 6 to Normal

r43033, r43034, and r43035 also looks related.

Note that though Unicode spec says non endian encoding should be Big Endian, actual world is often Little Endian.
Therefore don't guess its encoding if it doesn't have BOM.

#6

Updated by Hanmac (Hans Mackowiak) about 6 years ago

the bug is still in 2.2trunk with UTF-16 & #inspect

s="\xFF\xFE\"\x00i\x00d\x00\"\x00|\x00\"\x00s\x00y\x00s\x00t\x00e\x00m\x00_\x00c\x00o\x00d\x00e\x00\"\x00|\x00\"\x00a\x00s\x00s\x00e\x00m\x00b\x00l\x00y\x00_\x00c\x00o\x00d\x00e\x00\"\x00|\x00\"\x00d\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00\"\x00|\x00\"\x00c\x00r\x00e\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00\"\x00|\x00\"\x00u\x00p\x00d\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00\"\x00\r\x00\n"
s.force_encoding("UTF-16")
/usr/local/lib/ruby/2.2.0/irb/inspector.rb:122: [BUG] Segmentation fault at 0x00000000000000
ruby 2.2.0dev (2014-01-12 trunk 44563) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0020 p:---- s:0072 e:000071 CFUNC :inspect
c:0019 p:0010 s:0069 e:000068 BLOCK /usr/local/lib/ruby/2.2.0/irb/inspector.rb:122 [FINISH]
c:0018 p:---- s:0066 e:000065 CFUNC :call
c:0017 p:0011 s:0062 e:000061 METHOD /usr/local/lib/ruby/2.2.0/irb/inspector.rb:115
c:0016 p:0012 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.2.0/irb/context.rb:386
c:0015 p:0015 s:0055 e:000052 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:662
c:0014 p:0035 s:0050 e:000049 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:493
c:0013 p:0040 s:0042 e:000041 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:624
c:0012 p:0009 s:0037 e:000036 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:489
c:0011 p:0118 s:0033 e:000032 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:247 [FINISH]
c:0010 p:---- s:0030 e:000029 CFUNC :loop
c:0009 p:0007 s:0027 e:000026 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:233 [FINISH]
c:0008 p:---- s:0025 e:000024 CFUNC :catch
c:0007 p:0015 s:0021 e:000020 METHOD /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:232
c:0006 p:0030 s:0018 E:001858 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:488
c:0005 p:0008 s:0015 e:000014 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:397 [FINISH]
c:0004 p:---- s:0013 e:000012 CFUNC :catch
c:0003 p:0143 s:0009 E:000c58 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:396
c:0002 p:0021 s:0004 E:001608 EVAL /usr/local/bin/irb:15 [FINISH]
c:0001 p:0000 s:0002 E:001358 TOP [FINISH]

#7

Updated by Hanmac (Hans Mackowiak) about 6 years ago

Issue #8940 has been updated by Hans Mackowiak.

the bug is still in 2.2trunk with UTF-16 & #inspect

s="\xFF\xFE\"\x00i\x00d\x00\"\x00|\x00\"\x00s\x00y\x00s\x00t\x00e\x00m\x00_\x00c\x00o\x00d\x00e\x00\"\x00|\x00\"\x00a\x00s\x00s\x00e\x00m\x00b\x00l\x00y\x00_\x00c\x00o\x00d\x00e\x00\"\x00|\x00\"\x00d\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00\"\x00|\x00\"\x00c\x00r\x00e\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00\"\x00|\x00\"\x00u\x00p\x00d\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00\"\x00\r\x00\n"
s.force_encoding("UTF-16")
/usr/local/lib/ruby/2.2.0/irb/inspector.rb:122: [BUG] Segmentation fault at 0x00000000000000
ruby 2.2.0dev (2014-01-12 trunk 44563) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0020 p:---- s:0072 e:000071 CFUNC :inspect
c:0019 p:0010 s:0069 e:000068 BLOCK /usr/local/lib/ruby/2.2.0/irb/inspector.rb:122 [FINISH]
c:0018 p:---- s:0066 e:000065 CFUNC :call
c:0017 p:0011 s:0062 e:000061 METHOD /usr/local/lib/ruby/2.2.0/irb/inspector.rb:115
c:0016 p:0012 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.2.0/irb/context.rb:386
c:0015 p:0015 s:0055 e:000052 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:662
c:0014 p:0035 s:0050 e:000049 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:493
c:0013 p:0040 s:0042 e:000041 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:624
c:0012 p:0009 s:0037 e:000036 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:489
c:0011 p:0118 s:0033 e:000032 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:247 [FINISH]
c:0010 p:---- s:0030 e:000029 CFUNC :loop
c:0009 p:0007 s:0027 e:000026 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:233 [FINISH]
c:0008 p:---- s:0025 e:000024 CFUNC :catch
c:0007 p:0015 s:0021 e:000020 METHOD /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:232
c:0006 p:0030 s:0018 E:001858 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:488
c:0005 p:0008 s:0015 e:000014 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:397 [FINISH]
c:0004 p:---- s:0013 e:000012 CFUNC :catch
c:0003 p:0143 s:0009 E:000c58 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:396
c:0002 p:0021 s:0004 E:001608 EVAL /usr/local/bin/irb:15 [FINISH]
c:0001 p:0000 s:0002 E:001358 TOP [FINISH]


Bug #8940: printing UTF-32 crashs ruby
https://bugs.ruby-lang.org/issues/8940#change-44289

  • Author: Hans Mackowiak
  • Status: Assigned
  • Priority: Normal
  • Assignee:
  • Category:
  • Target version:
  • ruby -v: ruby 2.1.0dev (2013-09-23) [x86_64-darwin12.5.0]
  • Backport: 1.9.3: REQUIRED, 2.0.0: REQUIRED ---------------------------------------- using

p "äöü".encode("UTF-32")

does cause a SEGFAULT

-- C level backtrace information -------------------------------------------
0 libruby.2.1.0.dylib 0x00000001023f6679 rb_vm_bugreport + 137
1 libruby.2.1.0.dylib 0x00000001022bab1b report_bug + 283
2 libruby.2.1.0.dylib 0x00000001022ba9f4 rb_bug + 180
3 libruby.2.1.0.dylib 0x000000010237cc80 sigsegv + 144
4 libsystem_c.dylib 0x00007fff91d7d90a _sigtramp + 26
5 ??? 0x0000000000000000 0x0 + 0
6 libruby.2.1.0.dylib 0x00000001022b0045 rb_enc_precise_mbclen + 21
7 libruby.2.1.0.dylib 0x0000000102391cc8 rb_str_inspect + 968
8 libruby.2.1.0.dylib 0x00000001023f1e74 vm_call0_body + 2116
9 libruby.2.1.0.dylib 0x00000001023f1264 rb_call0 + 404
10 libruby.2.1.0.dylib 0x00000001023e7f15 rb_funcall + 261
11 libruby.2.1.0.dylib 0x0000000102312777 rb_inspect + 23
12 libruby.2.1.0.dylib 0x00000001022e663b rb_p + 11
13 libruby.2.1.0.dylib 0x00000001022f5b29 rb_f_p_internal + 57
14 libruby.2.1.0.dylib 0x00000001022c0b56 rb_ensure + 118
15 libruby.2.1.0.dylib 0x00000001022e9c9f rb_f_p + 31
16 libruby.2.1.0.dylib 0x00000001023f4baf vm_call_cfunc + 1007
17 libruby.2.1.0.dylib 0x00000001023f4528 vm_call_method + 840
18 libruby.2.1.0.dylib 0x00000001023deca7 vm_exec_core + 11591
19 libruby.2.1.0.dylib 0x00000001023eb4cd vm_exec + 109
20 libruby.2.1.0.dylib 0x00000001023ec2d8 rb_iseq_eval_main + 392
21 libruby.2.1.0.dylib 0x00000001022bfd69 ruby_exec_internal + 121
22 libruby.2.1.0.dylib 0x00000001022bfcae ruby_run_node + 78
23 ruby 0x0000000102274eef main + 79

--
http://bugs.ruby-lang.org/

#8

Updated by bideshstr (bidesh mondal) about 6 years ago

By Mistake

#9

Updated by bideshstr (bidesh mondal) about 6 years ago

By Mistake

#10

Updated by nobu (Nobuyoshi Nakada) about 6 years ago

  • Status changed from Assigned to Closed

Applied in changeset r44605.


string.c: use actual encodings

  • string.c (get_actual_encoding): get actual encoding according to the BOM if exists.
  • string.c (rb_str_inspect): use according encoding, instead of pseudo encodings, UTF-{16,32}. [ruby-core:59757] [Bug #8940]
#11

Updated by nobu (Nobuyoshi Nakada) about 6 years ago

  • Related to Bug #9415: Strings#codepoints doesn't respect BOM on UTF-{16,32} pseudo encodings added

Also available in: Atom PDF