Project

General

Profile

Actions

Bug #17478

closed

Ruby3.0 is slower than Ruby2.7.2 when parsing a large CSV file

Added by okkez (okkez _) almost 4 years ago. Updated about 1 year ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-linux]
[ruby-core:101732]

Description

Ruby3.0 is around 10%-20% slower than Ruby2.7.2 when parsing and aggregating a large CSV file.

The script is here:

require "csv"

name_to_cost = Hash.new(0)

CSV.foreach(ARGV[0], headers: true) do |row|
  name_to_cost[row["name"]] += row["cost"].to_f
end

name_to_cost.sort_by {|k, _| k }.each do |name, cost|
  printf "%s\t%.3f\n", name, cost
end

The sample data is like following(3 mega lines and the size is about 235MiB):

id,name,description,cost
2365599605,ysgHDPA,Voluptatem sit perferendis accusantium consequatur aut.,25.115
2365599606,xFLXOtJ,Sit accusantium aut perferendis voluptatem consequatur.,60.228
2365599607,RlkxNQB,Accusantium sit aut consequatur perferendis voluptatem.,79.663
2365599608,YVMbuva,Sit perferendis voluptatem accusantium aut consequatur.,49.863
2365599609,rtxVcDW,Accusantium voluptatem sit perferendis aut consequatur.,50.765
2365599610,rtxVcDW,Aut sit accusantium consequatur perferendis voluptatem.,94.310
2365599611,muDwuke,Consequatur sit accusantium aut perferendis voluptatem.,16.991
2365599612,tkqFWyM,Perferendis sit voluptatem consequatur aut accusantium.,98.753
  • Ruby2.7.2: 25.37 seconds
  • Ruby3.0.0: 27.53 seconds

I use this program to generate the test CSV file: https://gist.github.com/okkez/05ffa0df08cf49014f460eb2e8543698

In case of using another private data:

  • Ruby2.7.2: 31.54 seconds
  • Ruby3.0.0: 37.15 seconds

The private data is like followings:

  • There are 18 columns
  • There are 1144305 lines
  • It is 334MiB
Actions

Also available in: Atom PDF

Like0
Like0Like0