Project

General

Profile

Backport #6206

encoding of empty string from String#split

Added by no6v (Nobuhiro IMAI) over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
[ruby-dev:45441]

Description

String#split が空文字列を返す場合に、エンコーディングが ASCII-8BIT になる時がありますが、
これは意図的でしょうか?

a = "a:".split(":", 2) # => ["a", ""]
a.map(&:encoding) # => [#Encoding:UTF-8, #Encoding:ASCII-8BIT]

関係あるかどうか分かりませんが、partition だと以下のようになります。

a = "a:".partition(":") # => ["a", ":", ""]
a.map(&:encoding) # => [#Encoding:UTF-8, #Encoding:UTF-8, #Encoding:UTF-8]

が、パターンが含まれていない場合は ASCII-8BIT になります。

a = "a:".partition("|") # => ["a:", "", ""]
a.map(&:encoding) # => [#Encoding:UTF-8, #Encoding:ASCII-8BIT, #Encoding:ASCII-8BIT]

SQLite3 が ASCII-8BIT の文字列を text なカラムに追加するときに、
勝手に blob になってしまって検索出来なくなるという問題を見かけました。

http://www.mew.org/pipermail/mew-dist/2012-March/029160.html

Associated revisions

Revision ab9c982c
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@35146 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 35146
Added by nobu (Nobuyoshi Nakada) over 7 years ago

  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Revision 3e89498b
Added by naruse (Yui NARUSE) over 7 years ago

merge revision(s) 35146:

    * string.c (str_new_empty): should copy also the encoding as an
      empty substring.  [ruby-dev:45441][Bug #6206]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_9_3@35178 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 35178
Added by naruse (Yui NARUSE) over 7 years ago

merge revision(s) 35146:

* string.c (str_new_empty): should copy also the encoding as an
  empty substring.  [ruby-dev:45441][Bug #6206]

History

Updated by no6v (Nobuhiro IMAI) over 7 years ago

"" が ASCII-8BIT になるのが問題というよりは、処理の途中で出来た ASCII-8BIT な
"" に対して何か + したり << したりして結果として ASCII-8BIT な文字列が出来てしまうのが
問題になるケースがあるということだと思います。

#2

Updated by nobu (Nobuyoshi Nakada) over 7 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r35146.
Nobuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • string.c (str_new_empty): should copy also the encoding as an empty substring. [ruby-dev:45441][Bug #6206]

Updated by no6v (Nobuhiro IMAI) over 7 years ago

r35146 の 1.9.3 へのバックポートを希望します。
私にはプロジェクト間のチケットの移動は出来ないようなので、
よろしければどなたかお願いします。

$ ruby -ve 'p "a:".split(":", 2).map(&:encoding)'
ruby 1.9.3p168 (2012-03-29 revision 35166) [x86_64-linux]
[#Encoding:UTF-8, #Encoding:ASCII-8BIT]

#4

Updated by naruse (Yui NARUSE) over 7 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport193
  • Status changed from Closed to Assigned
  • Assignee set to naruse (Yui NARUSE)
#5

Updated by naruse (Yui NARUSE) over 7 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r35178.
Nobuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 35146:

* string.c (str_new_empty): should copy also the encoding as an
  empty substring.  [ruby-dev:45441][Bug #6206]

Also available in: Atom PDF