Bug #5637

warnings of shellescape

Added by Kazuhiro NISHIYAMA over 3 years ago. Updated over 2 years ago.

[ruby-dev:44878]
Status:Closed
Priority:Normal
Assignee:Akinori MUSHA
ruby -v:- Backport:

Description

\あ

Associated revisions

Revision 34166
Added by Akinori MUSHA about 3 years ago

  • lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag that only causes warnings with no real effect. [Bug #5637]

Revision 34166
Added by Akinori MUSHA about 3 years ago

  • lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag that only causes warnings with no real effect. [Bug #5637]

History

#1 Updated by Kazuhiro NISHIYAMA over 3 years ago

  • ruby -v changed from ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux] to -

西山和広です。

redmine の方で書くと消えてしまうようなので、メールで書き直します。

Shellwords.shellescape で警告が出ます。

% ./ruby -v -r shellwords -e 'p Shellwords.shellescape("\u3042")'
ruby 2.0.0dev (2011-11-15 trunk 33753) [x86_64-linux]
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
/home/chkbuild/tmp/build/ruby-trunk/20111114T222552Z/lib/ruby/1.9.1/shellwords.rb:86: warning: regexp match /.../n against to UTF-8 string
"\あ"

エスケープ結果も変だと思います。
エスケープ結果を 1.8.7 にあわせるのなら以下のパッチで
どうでしょうか。

diff --git a/lib/shellwords.rb b/lib/shellwords.rb
index 5d6ba75..78331a7 100644
--- a/lib/shellwords.rb
+++ b/lib/shellwords.rb
@@ -79,11 +79,11 @@ module Shellwords
# An empty argument will be skipped, so return empty quotes.
return "''" if str.empty?

  • str = str.dup
  • str = str.dup.force_encoding("ASCII-8BIT")

    # Process as a single byte sequence because not all shell
    # implementations are multibyte aware.

  • str.gsub!(/([A-Za-z0-9_-.,:\/@\n])/n, "\\\1")

  • str.gsub!(/([A-Za-z0-9_-.,:\/@\n])/, "\\\1")

    # A LF cannot be escaped with a backslash because a backslash + LF
    # combo is regarded as line continuation and simply ignored.
    diff --git a/test/test_shellwords.rb b/test/test_shellwords.rb
    index d48a888..cbc5043 100644
    --- a/test/test_shellwords.rb
    +++ b/test/test_shellwords.rb
    @@ -36,4 +36,8 @@ class TestShellwords < Test::Unit::TestCase
    shellwords(bad_cmd)
    end
    end
    +

  • def test_shellescape_utf8_string

  • assert_equal "\\343\\201\\202", shellescape("\u3042")

  • end
    end

--
|ZnZ(ゼット エヌ ゼット)
|西山和広(Kazuhiro NISHIYAMA)

#2 Updated by Akinori MUSHA over 3 years ago

  • Assignee set to Akinori MUSHA

#3 Updated by Akinori MUSHA over 3 years ago

いろいろ考えたんですが、単に //n フラグを削るだけにしようと思います。

・1.8: 一律バイナリとして扱うのは、文字列にencoding情報がなく$KCODEもあてにならないため、やむを得ない仕様だった。(この事情は1.9+には当てはまらない)
・1.9: 1.9.3の今までずっとこの挙動だった。警告はバグ(//nの修正漏れ)として消すが、挙動については非互換を招くので変えない。
・2.0: 文字列の使い道(渡すシェルのlocaleなど)を知っているのは呼出元だけだが、1.9+では呼出元がASCII-8BITも含め適切にencodeすることができるので、shellescapeがそのencodingを尊重する現在の挙動こそ(たまたまだが)望ましく、変える必要はない。

警告の出しようもない(SJISなら云々とかもシェルのlocaleをSJISにするなど分かってやっている場合は害)ので、余計なことはせず、ドキュメントにだけ注記するつもりです。

#4 Updated by Akinori MUSHA about 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r34166.
Kazuhiro, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • lib/shellwords.rb (Shellwords#shellescape): Drop the //n flag that only causes warnings with no real effect. [Bug #5637]

#5 Updated by Darío Cravero over 2 years ago

Hi,

Thanks for this patch!.. :)

One question though, from comment #3 it's not clear if it's safe to use it in 1.9.3. This is what Google Translator gave me:

"1.9: this behavior was all the way to 1.9.3 now. Turn off warning but does not change as a bug (missing fix of / / n), because the behavior leads to incompatibility."

However, I've applied it and, as expected, I don't see the warning anymore. Still, can you just confirm there're no side effects to this on 1.9.3?

Thanks a million!..

#6 Updated by Akinori MUSHA over 2 years ago

As I documented, it's all up to how you use the resulted string.

If you are going to pass it to a shell that lacks support for the encoding of the string, then you should probably encode the original string in ASCII-8BIT before shell-escaping with shellescape() to get a byte-by-byte escape to make sure the shell won't find a metacharacter inside a multibyte character.

UTF-8 multibyte characters do not contain any ASCII character by design anyway, so most people in the everything-is-UTF-8 world don't even have to care about this.

But, for example, when you have to run a program passing a Shift_JIS string via a shell under a non-Shift_JIS locale, you'd probably have to compose the command line in the ASCII-8BIT encoding so that all shell metacharacters that may appear in Shift_JIS multibyte characters are properly escaped.

Also available in: Atom PDF