Feature #14417
closedString#sub / String#gsub に『キーが Symbol の Hash』を渡せるようにする提案
Description
概要¶
String#sub
/ String#gsub
に『キーが Symbol
の Hash
』を渡した場合でも String
の場合と同様に置き換える。
現行の動作¶
hash = {'b'=>'B', 'c'=>'C'}
p "abcabc".gsub(/[bc]/, hash) #=> "aBCaBC"
# キー が Symbol の Hash は置き換えられない
hash = { b: 'B', c: 'C' }
p "abcabc".gsub(/[bc]/, hash) #=> "aa"
# キー が Symbol の Hash は String に変換する必要がある
p "abcabc".gsub(/[bc]/, hash.transform_keys(&:to_s)) #=> "aBCaBC"
提案する動作¶
# キーが String の場合は現行維持
hash = {'b'=>'B', 'c'=>'C'}
p "abcabc".gsub(/[bc]/, hash) #=> "aBCaBC"
hash = { b: 'B', c: 'C' }
# $& は動的であるべきなので String のまま
p "abcabc".gsub(/[bc]/){hash[$&]} #=> "aa"
# ブロックの引数は動的であるべきなので String のまま
p "abcabc".gsub(/[bc]/){ |s| hash[s] } #=> a"
# Hash を直接渡した場合のみキーが Symbol でも許容する
p "abcabc".gsub(/[bc]/, hash) #=> "aBCaBC"
利点¶
- キーを Symbol で書くことを推奨しているコーディング規約がある
- キーを Symbol で定義する方が Hash を書いていて気持ちがいい
課題¶
-
String
とSymbol
の両方のキーがあった場合どうするか"abcabc".sub(/[bc]/, { "b" => "A", b: "C" }) # => ???
- 現状は
String
を優先している - それ以前に
String
とSymbol
が混ざっている Hash はおかしいのではないだろうか - 警告を出すとか?
String#gsub
のユースケースなど
# http://batsov.com/articles/2013/10/03/using-rubys-gsub-with-a-hash/
def geekify(string)
string.gsub(/[leto]/, l: '1', e: '3', t: '7', o: '0')
end
p geekify('leet') # => '1337'
p geekify('noob') # => 'n00b'
def doctorize(string)
string.gsub(/M(iste)?r/, Mister: 'Doctor', Mr: 'Dr')
end
p doctorize('Mister Freeze') # => 'Doctor Freeze'
p doctorize('Mr Smith') # => 'Dr Smith'
# https://coderwall.com/p/t4y7cw/ruby-gsub-with-a-hash-or-block
amino_acid_hash = { A: 'Ala', R: 'Arg', N: 'Asn' }
p "R232A".gsub(/[A-Z]/, amino_acid_hash)
# => "Arg232Ala"
# https://qiita.com/scivola/items/416155c307ec29a37b8f
hash = {
'&': "&",
'<': "<",
'>': ">",
}
p "<Q&A>".gsub(/[&<>]/, hash)
# => "<Q&A>"
# https://qiita.com/pocari/items/34855a9b07ea5006fe80
hash = {
'#to#': "taro",
'#from#': "jiro",
}
template = <<EOS
hello, #to#.
message from #from#.
EOS
puts template.gsub(/#.*#/, hash)
# => hello, taro.
# message from jiro.
その他、具体的なユースケースを思いついた方がいればコメントいただけると助かります。
Files
Updated by Hanmac (Hans Mackowiak) almost 7 years ago
even if Ruby Symbols are freed now, i still have some problems with that it creates that much symbols from possible tainted string data
would it probably better if sub/gsub would call hash.transform_keys(&:to_s)
internal in their code with the hash if hash is given?
if yes then this would work too:
"12345".gsub(/\d/,{"1" => "A", "2" => "B", "3" => "C", "4" => "D", "5" => "E"}) #=> "ABCDE"
"12345".gsub(/\d/,{1 => "A", 2 => "B", 3 => "C", 4 => "D", 5 => "E"}) #=> "ABCDE"
Updated by shyouhei (Shyouhei Urabe) almost 7 years ago
- Status changed from Open to Feedback
提案されている利点は弱すぎて賛成しがたいです(趣味では)。
とはいえ機能自体に反対ではないですから、より具体的なユースケースがあると賛成しやすくなるかなと思います。
Updated by osyo (manga osyo) almost 7 years ago
Thanks for reply!!!
would it probably better if sub/gsub would call hash.transform_keys(&:to_s) internal in their code with the hash if hash is given?
hmmmm... Thanks for idea :)
とはいえ機能自体に反対ではないですから、より具体的なユースケースがあると賛成しやすくなるかなと思います。
そうですねえ…もう少し具体的なユースケースを考えてみたいと思います。
コメントありがとうございます。
Updated by Hanmac (Hans Mackowiak) almost 7 years ago
did look at string.c for gsub code,
https://github.com/ruby/ruby/blob/trunk/string.c#L5094
seems to be the line where we could add a transform_keys call
but i don't know currently what the best way to call hash.transform_keys(&:to_s)
probably something with rb_funcall_with_block
?
Updated by duerst (Martin Dürst) almost 7 years ago
gsub
with Hash is used in some contexts where high performance is of interest. An example is lib/unicode_normalize/normalize.rb
. This proposal would make these cases less efficient, for the benefit of people who can't keep Symbols and Strings apart.
As discussed in another issue, b: 'B', c: 'C'
is not a shortcut for 'b'=>'B', 'c'=>'C'
. We already have methods to change Hash keys (or values), and we probably need more of them, but I think we don't need more methods that accepts strings and symbols indeterminately.
Updated by Hanmac (Hans Mackowiak) almost 7 years ago
@duerst (Martin Dürst): what about my example where it does transform the keye internal for the given Hash?
or is that a nono too?
it might be possible to only do it if the given hash has non String key?
Updated by naruse (Yui NARUSE) almost 7 years ago
Hanmac (Hans Mackowiak) wrote:
@duerst (Martin Dürst): what about my example where it does transform the keye internal for the given Hash?
or is that a nono too?
it might be possible to only do it if the given hash has non String key?
If the hash is called many times from gsub, those integers shold be converted as String before gsub.
Because such conversion needs object allocation many times, and cause many GC.
I think
h = {1 => "A", 2 => "B", 3 => "C", 4 => "D", 5 => "E"}
"12345".gsub(/\d/){ h[$&.to_i] }
is faster than such code.