Bug #8698

レシーバに不正なバイト列が含まれている場合にString#each_lineや#linesの挙動が引数の有無で変わってしまう

Added by Masaki Matsushita over 1 year ago. Updated over 1 year ago.

[ruby-dev:<unknown>]
Status:Closed
Priority:Normal
Assignee:Masaki Matsushita
ruby -v:ruby 2.1.0dev (2013-07-28 trunk 42211) [x86_64-linux] Backport:1.9.3: UNKNOWN, 2.0.0: UNKNOWN

Description

String#each_lineや#linesは、レシーバに不正なバイト列が含まれている場合に引数無しで呼ばれると例外を発生させませんが、
引数を与えると例外を発生させます。

invalid_str = "\x80" * 3
invalid_str.each_line {} # no error
invalid_str.each_line("foo") {} # invalid byte sequence in UTF-8 (ArgumentError)

invalid_str.lines # no error
invalid_str.lines("foo") # # invalid byte sequence in UTF-8 (ArgumentError)

レシーバに不正なバイト列が含まれている場合の挙動が引数の有無で変わってしまうというのは、仕様のバグではないでしょうか。
String#each_lineや#linesは引数を渡さずに使われる場合が多いと思うので、そちらの挙動に寄せて
「String#each_lineや#linesはレシーバに不正なバイト列が含まれていても例外を発生させない」という仕様に統一する事を提案します。

添付のpatchは引数の有無に関わらずrb_memsearch()を使って検索を行う事で上記の仕様に統一したもので、[Feature #7368]で提案している
patch3.diffに少し手を加えたものです。また、test/ruby/test_m17n_comb.rbに引数を与えて#each_lineを呼ぶと例外が発生する事を期待している
テストがあるので、その部分を削っています。

patch.diff Magnifier (4.55 KB) Masaki Matsushita, 07/28/2013 05:14 PM


Related issues

Related to Ruby trunk - Feature #7368: rb_str_each_line()のパフォーマンス向上とリファクタリング Closed 11/16/2012

Associated revisions

Revision 42966
Added by glass over 1 year ago

  • string.c (rb_str_enumerate_lines): make String#each_line and
    #lines not raise invalid byte sequence error when it is called
    with an argument. The patch also causes performance improvement.
    [Bug #8698]

  • test/ruby/test_m17n_comb.rb (test_str_each_line): remove
    assertions which check that String#each_line and #lines will
    raise an error if the receiver includes invalid byte sequence.

Revision 42966
Added by glass over 1 year ago

  • string.c (rb_str_enumerate_lines): make String#each_line and
    #lines not raise invalid byte sequence error when it is called
    with an argument. The patch also causes performance improvement.
    [Bug #8698]

  • test/ruby/test_m17n_comb.rb (test_str_each_line): remove
    assertions which check that String#each_line and #lines will
    raise an error if the receiver includes invalid byte sequence.

History

#1 Updated by Koichi Sasada over 1 year ago

  • Assignee set to Yui NARUSE

#2 Updated by Yui NARUSE over 1 year ago

  • Status changed from Open to Assigned

#3 Updated by Yui NARUSE over 1 year ago

  • Assignee changed from Yui NARUSE to Masaki Matsushita

コミットして下さい

#4 Updated by Anonymous over 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r42966.
Masaki, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • string.c (rb_str_enumerate_lines): make String#each_line and
    #lines not raise invalid byte sequence error when it is called
    with an argument. The patch also causes performance improvement.
    [Bug #8698]

  • test/ruby/test_m17n_comb.rb (test_str_each_line): remove
    assertions which check that String#each_line and #lines will
    raise an error if the receiver includes invalid byte sequence.

Also available in: Atom PDF