Actions
Bug #13136
closedlarge_array.sample(11)が遅い
Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin16]
Description
Array#sampleのパフォーマンスを改善したい
require 'benchmark'
arr = 100000.times.to_a;
Benchmark.measure{100000.times{arr.sample 10}}.real #=> 0.07052100007422268 速い!
Benchmark.measure{100000.times{arr.sample 11}}.real #=> 37.8880900000222 遅い!!!
現状のArray#sample(n)
n<=3 → 個別場合分け
3<n<=10 → 2重ループ O(n^2)
# n < 適当な閾値 → O(n)のアルゴリズムを使いたい
n>10 → dupしてshuffle O(size) 巨大配列から少数sampleする時に遅い
アルゴリズム改善案について
# n>10の時の現状のアルゴリズム O(size)
def sample arr, n
arr = arr.dup
n.times do |i|
j = rand(arr.size - i) + i
arr[i], arr[j] = arr[j], arr[i]
end
arr.take n
end
# dupをやめ、配列への代入を減らす(ただし元配列を破壊してしまう) O(n)
def sample arr, n
n.times.map do |i|
j = rand(arr.size - i) + i
val = arr[j]
arr[j] = arr[i]
val
end
end
# 配列を書き換える代わりに、どこを入れ替えたかhashに保存する(元配列を破壊しない) O(n)
def sample arr, n
replace_table = {}
n.times.map do |i|
j = rand(arr.size - i) + i
j2 = replace_table[j] || j
replace_table[j] = replace_table[i] || i
arr[j2]
end
end
patch作りました
ベンチマーク
arr = 100000.times.to_a;
def measure;t=Time.now;yield;Time.now-t;end
p measure{10000.times{arr.sample 10}}
p measure{10000.times{arr.sample 11}}
p measure{10000.times{arr.sample 100}}
p measure{10000.times{arr.sample 1000}}
ruby 2.4.0p0
0.008185
3.938569
4.043285
4.142051
this patch
0.010408
0.016441
0.08861
0.684844
Files
Actions
Like0
Like0Like0Like0Like0