Backport #5635
closedString#unpack("M") の不正データ時の振る舞い
Description
String#unpack("M") で "=hoge" みたいな不正なデータがあった場合、現在は
そこで処理を中断してしまっていますが、それ以降のデータをすべて捨ててし
まうのはかわいそうなので、ポインタを一つ進めて処理を継続した方がいいん
じゃないかと思うのですがどうでしょうか。
RFC2045 には次のような記述があります。
(2) An "=" followed by a character that is neither a
hexadecimal digit (including "abcdef") nor the CR
character of a CRLF pair is illegal. This case can be
the result of US-ASCII text having been included in a
quoted-printable part of a message without itself
having been subjected to quoted-printable encoding. A
reasonable approach by a robust implementation might be
to include the "=" character and the following
character in the decoded data without any
transformation and, if possible, indicate to the user
that proper decoding was not possible at this point in
the data.
Index: pack.c¶
--- pack.c (リビジョン 33758)
+++ pack.c (作業コピー)
@@ -2008,20 +2008,23 @@
while (s < send) {
if (*s == '=') {
-
if (++s == send) break;
-
if (s+1 < send && *s == '\r' && *(s+1) == '\n')
-
s++;
-
if (*s != '\n') {
-
if ((c1 = hex2num(*s)) == -1) break;
-
if (++s == send) break;
-
if ((c2 = hex2num(*s)) == -1) break;
-
*ptr++ = c1 << 4 | c2;
-
if (s+1 < send && *(s+1) == '\n') {
-
s += 2;
-
continue;
-
}
-
if (s+2 < send) {
-
if (*(s+1) == '\r' && *(s+2) == '\n') {
-
s += 3;
-
continue;
-
}
-
if ((c1 = hex2num(*(s+1))) > -1 && (c2 = hex2num(*(s+2))) > -1) {
-
*ptr++ = c1 << 4 | c2;
-
s += 3;
-
continue;
-
} } }
-
else {
-
*ptr++ = *s;
-
}
-
s++;
-
*ptr++ = *s++; } rb_str_set_len(buf, ptr - RSTRING_PTR(buf)); ENCODING_CODERANGE_SET(buf, rb_ascii8bit_encindex(), ENC_CODERANGE_VALID);
Index: test/ruby/test_pack.rb¶
--- test/ruby/test_pack.rb (リビジョン 33758)
+++ test/ruby/test_pack.rb (作業コピー)
@@ -612,6 +612,17 @@
assert_equal([0x100000000], "\220\200\200\200\000".unpack("w"), [0x100000000])
end
- def test_pack_unpack_M
- assert_equal(["pre123after"], "pre=31=32=33after".unpack("M"))
- assert_equal(["preafter"], "pre=\nafter".unpack("M"))
- assert_equal(["preafter"], "pre=\r\nafter".unpack("M"))
- assert_equal(["pre="], "pre=".unpack("M"))
- assert_equal(["pre=\r"], "pre=\r".unpack("M"))
- assert_equal(["pre=hoge"], "pre=hoge".unpack("M"))
- assert_equal(["pre=1after"], "pre==31after".unpack("M"))
- assert_equal(["pre==1after"], "pre===31after".unpack("M"))
- end
- def test_modify_under_safe4
s = "foo"
assert_raise(SecurityError) do
Updated by naruse (Yui NARUSE) about 13 years ago
うーん、RFCに変換せずにそのままくっつけとけとあるのでしたらその通りにした方がいいように思うのですが
Updated by tommy (Masahiro Tomita) about 13 years ago
あれ? 不正なデータについては、そのままにしてるつもりなのですが。
少なくとも今の不正なデータ以降全部削ってしまう動きよりはいいんじゃないかと…。
Updated by naruse (Yui NARUSE) about 13 years ago
わたしには処理を継続しないと読めるんですがどうなんでしょう。
まぁ、RFCよりもこの手の通信系は長いものに巻かれるのが正しい気もするので、他の実装の例でも。
Updated by tommy (Masahiro Tomita) about 13 years ago
あ~、なるほど、不正なデータに遭遇したら、それ以降のデータは一切変換するな…と。
確かにそっちの方がいいような気がします。
Updated by ko1 (Koichi Sasada) almost 13 years ago
- Assignee set to naruse (Yui NARUSE)
Updated by naruse (Yui NARUSE) almost 13 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r34972.
Masahiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- pack.c (pack_unpack): when unpack('M') occurs an illegal byte
sequence, output the "=" character and the following character in
the decoded data without any transformation.
[ruby-dev:44875] [Bug #5635]
Updated by naruse (Yui NARUSE) over 12 years ago
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport193