Bug #21838
closedRails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024)
Description
Hello,
I noticed a performance degradation with Ruby 3.4 when seeding my Rails application, which does image derivation via the Shrine Gem and libvips image library.
I skipped Ruby 3.4 hoping that 4.0 would magically solve my performance issue. Alas not.
My degradation when doing rails db:reset (aka database seeding) was about 20% as compared with Ruby 3.3.
The last couple of days I looked at this issue more closely and with the help of git bisect I believe this commit: https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079ef92b5e59b616d670efe81a33e500f2bf6451 is the cause of my regression. That commit implements global allocatable slots and empty pages.
The revision before is fast when seeding my Rails application.
This revision however slows down my Rails seeding by about 20%, from 30 seconds to 40 seconds consistently.
I created a minimal Rails application as the reproducible test case: https://github.com/bluz71/rails_app_ruby_slowdown
Steps to reproduce:
git clone https://github.com/bluz71/rails_app_ruby_slowdowncd rails_app_ruby_slowdownbundle config set force_ruby_platform truebundle installbundle exec rails db:createRUBY_YJIT_ENABLE=0 time bundle exec rails db:reset
Note, libvips needs to be installed at the system level, for example sudo apt install libvips, libvips is the image processing library used by Shrine which in turn is used in this Rails application when seeding.
Use Ruby 3.3 and 3.4 to compare, or use Revision de7ac11a (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/de7ac11a09249defd93d1ee34c817fa9d31e7936) and Revision 079ef92b (https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/079ef92b5e59b616d670efe81a33e500f2bf6451) to compare.
I hope this is reproducible, if so, I also hope this performance loss can be minimised in future.
Best regards.
Updated by byroot (Jean Boussier) 6 days ago
- Subject changed from Rails seeding degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024) to Rails seeing degradation (20% slowdown) related to Revision 079ef92b "Implement global allocatable slots and empty pages" (from Sep 5 2024)
Updated by byroot (Jean Boussier) 6 days ago
So I can somewhat repro. On my machine (macOS/M3) about 4.0 is half-way between 3.3 and 3.4.
I added:
pp GC.stat
pp GC.stat_heap
At the end of your script, given based on the description GC is likely to play a role in what you are seeing.
And indeed:
- 3.3 spends 1.6s in GC with ~2k minors but only 13 majors.
- 3.4 spends 8.1s in GC with ~2k minors and ~400 majors.
- 4.0 spends 3.8s in GC with ~1.9k minors and 137 majors.
The time spent in GC seems to account for most of the difference between versions so I suspect the problem isn't so much the change itself, more that in consequence the default GC parameters aren't working as well with your app.
But cc @peterzhu2118 (Peter Zhu) just in case.
3.3.4:
=== Image Processing Only ===
User CPU: 20.143620000000002s
System CPU: 11.150236000000001s
Real time: 28.597061999840662s
==============================
=== Benchmark Results ===
User CPU: 20.18374s
System CPU: 11.297849s
Real time: 28.78712600003928s
========================
{:count=>2042,
:time=>1677,
:marking_time=>582,
:sweeping_time=>1095,
:heap_allocated_pages=>426,
:heap_sorted_length=>618,
:heap_allocatable_pages=>192,
:heap_available_slots=>370125,
:heap_live_slots=>369139,
:heap_free_slots=>986,
:heap_final_slots=>0,
:heap_marked_slots=>345806,
:heap_eden_pages=>426,
:heap_tomb_pages=>0,
:total_allocated_pages=>426,
:total_freed_pages=>0,
:total_allocated_objects=>13435015,
:total_freed_objects=>13065876,
:malloc_increase_bytes=>2660112,
:malloc_increase_bytes_limit=>16777216,
:minor_gc_count=>2029,
:major_gc_count=>13,
:compact_count=>0,
:read_barrier_faults=>0,
:total_moved_objects=>0,
:remembered_wb_unprotected_objects=>0,
:remembered_wb_unprotected_objects_limit=>2963,
:old_objects=>344695,
:old_objects_limit=>592776,
:oldmalloc_increase_bytes=>2660112,
:oldmalloc_increase_bytes_limit=>18971452}
{0=>
{:slot_size=>40,
:heap_allocatable_pages=>62,
:heap_eden_pages=>132,
:heap_eden_slots=>216136,
:heap_tomb_pages=>0,
:heap_tomb_slots=>0,
:total_allocated_pages=>132,
:total_freed_pages=>0,
:force_major_gc_count=>3,
:force_incremental_marking_finish_count=>0,
:total_allocated_objects=>4953144,
:total_freed_objects=>4737427},
1=>
{:slot_size=>80,
:heap_allocatable_pages=>22,
:heap_eden_pages=>123,
:heap_eden_slots=>100671,
:heap_tomb_pages=>0,
:heap_tomb_slots=>0,
:total_allocated_pages=>123,
:total_freed_pages=>0,
:force_major_gc_count=>2,
:force_incremental_marking_finish_count=>0,
:total_allocated_objects=>3636473,
:total_freed_objects=>3536459},
2=>
{:slot_size=>160,
:heap_allocatable_pages=>11,
:heap_eden_pages=>118,
:heap_eden_slots=>48238,
:heap_tomb_pages=>0,
:heap_tomb_slots=>0,
:total_allocated_pages=>118,
:total_freed_pages=>0,
:force_major_gc_count=>5,
:force_incremental_marking_finish_count=>1,
:total_allocated_objects=>4271349,
:total_freed_objects=>4223425},
3=>
{:slot_size=>320,
:heap_allocatable_pages=>31,
:heap_eden_pages=>19,
:heap_eden_slots=>3870,
:heap_tomb_pages=>0,
:heap_tomb_slots=>0,
:total_allocated_pages=>19,
:total_freed_pages=>0,
:force_major_gc_count=>0,
:force_incremental_marking_finish_count=>0,
:total_allocated_objects=>567972,
:total_freed_objects=>564149},
4=>
{:slot_size=>640,
:heap_allocatable_pages=>64,
:heap_eden_pages=>36,
:heap_eden_slots=>3667,
:heap_tomb_pages=>0,
:heap_tomb_slots=>0,
:total_allocated_pages=>36,
:total_freed_pages=>0,
:force_major_gc_count=>0,
:force_incremental_marking_finish_count=>0,
:total_allocated_objects=>8050,
:total_freed_objects=>4416}}
real 0m29.475s
user 0m22.232s
sys 0m13.533s
3.4.6:
=== Image Processing Only ===
User CPU: 28.113429999999997s
System CPU: 13.080968s
Real time: 38.77424399997108s
==============================
=== Benchmark Results ===
User CPU: 28.166331000000003s
System CPU: 13.227464s
Real time: 38.97528999997303s
========================
{count: 2428,
time: 8179,
marking_time: 6345,
sweeping_time: 1834,
heap_allocated_pages: 449,
heap_empty_pages: 0,
heap_allocatable_slots: 0,
heap_available_slots: 390466,
heap_live_slots: 329216,
heap_free_slots: 61250,
heap_final_slots: 0,
heap_marked_slots: 240286,
heap_eden_pages: 449,
total_allocated_pages: 449,
total_freed_pages: 0,
total_allocated_objects: 15283498,
total_freed_objects: 14954282,
malloc_increase_bytes: 960,
malloc_increase_bytes_limit: 16777216,
minor_gc_count: 2031,
major_gc_count: 397,
compact_count: 0,
read_barrier_faults: 0,
total_moved_objects: 0,
remembered_wb_unprotected_objects: 0,
remembered_wb_unprotected_objects_limit: 2368,
old_objects: 236843,
old_objects_limit: 473686,
oldmalloc_increase_bytes: 960,
oldmalloc_increase_bytes_limit: 16777216}
{0 =>
{slot_size: 40,
heap_eden_pages: 146,
heap_eden_slots: 239053,
total_allocated_pages: 146,
force_major_gc_count: 2,
force_incremental_marking_finish_count: 1,
total_allocated_objects: 5176949,
total_freed_objects: 4973920},
1 =>
{slot_size: 80,
heap_eden_pages: 115,
heap_eden_slots: 94118,
total_allocated_pages: 115,
force_major_gc_count: 312,
force_incremental_marking_finish_count: 33,
total_allocated_objects: 3689256,
total_freed_objects: 3604380},
2 =>
{slot_size: 160,
heap_eden_pages: 114,
heap_eden_slots: 46603,
total_allocated_pages: 114,
force_major_gc_count: 49,
force_incremental_marking_finish_count: 2,
total_allocated_objects: 4973174,
total_freed_objects: 4936620},
3 =>
{slot_size: 320,
heap_eden_pages: 31,
heap_eden_slots: 6314,
total_allocated_pages: 31,
force_major_gc_count: 0,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 1440404,
total_freed_objects: 1439188},
4 =>
{slot_size: 640,
heap_eden_pages: 43,
heap_eden_slots: 4378,
total_allocated_pages: 43,
force_major_gc_count: 0,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 7756,
total_freed_objects: 7574}}
real 0m39.772s
user 0m30.376s
sys 0m15.641s
4.0.0:
=== Image Processing Only ===
User CPU: 23.316253s
System CPU: 13.033615000000001s
Real time: 33.97317499993369s
==============================
=== Benchmark Results ===
User CPU: 23.362938999999997s
System CPU: 13.172686s
Real time: 34.16143300011754s
========================
{count: 2021,
time: 3851,
marking_time: 2476,
sweeping_time: 1374,
heap_allocated_pages: 420,
heap_empty_pages: 0,
heap_allocatable_slots: 92694,
heap_available_slots: 363270,
heap_live_slots: 358735,
heap_free_slots: 4535,
heap_final_slots: 0,
heap_marked_slots: 332764,
heap_eden_pages: 420,
total_allocated_pages: 420,
total_freed_pages: 0,
total_allocated_objects: 13840724,
total_freed_objects: 13481989,
malloc_increase_bytes: 2851188,
malloc_increase_bytes_limit: 16777216,
minor_gc_count: 1884,
major_gc_count: 137,
compact_count: 0,
read_barrier_faults: 0,
total_moved_objects: 0,
remembered_wb_unprotected_objects: 0,
remembered_wb_unprotected_objects_limit: 3324,
old_objects: 332470,
old_objects_limit: 664940,
oldmalloc_increase_bytes: 2851188,
oldmalloc_increase_bytes_limit: 16777216}
{0 =>
{slot_size: 40,
heap_live_slots: 206502,
heap_free_slots: 1454,
heap_final_slots: 0,
heap_eden_pages: 127,
heap_eden_slots: 207956,
total_allocated_pages: 149,
force_major_gc_count: 0,
force_incremental_marking_finish_count: 2,
total_allocated_objects: 4977191,
total_freed_objects: 4770689},
1 =>
{slot_size: 80,
heap_live_slots: 100320,
heap_free_slots: 347,
heap_final_slots: 0,
heap_eden_pages: 123,
heap_eden_slots: 100667,
total_allocated_pages: 1031,
force_major_gc_count: 111,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 2417433,
total_freed_objects: 2317113},
2 =>
{slot_size: 160,
heap_live_slots: 47162,
heap_free_slots: 254,
heap_final_slots: 0,
heap_eden_pages: 116,
heap_eden_slots: 47416,
total_allocated_pages: 225,
force_major_gc_count: 94,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 5001103,
total_freed_objects: 4953941},
3 =>
{slot_size: 320,
heap_live_slots: 4862,
heap_free_slots: 29,
heap_final_slots: 0,
heap_eden_pages: 24,
heap_eden_slots: 4891,
total_allocated_pages: 3399,
force_major_gc_count: 127,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 1440733,
total_freed_objects: 1435871},
4 =>
{slot_size: 640,
heap_live_slots: 3149,
heap_free_slots: 10,
heap_final_slots: 0,
heap_eden_pages: 31,
heap_eden_slots: 3159,
total_allocated_pages: 151,
force_major_gc_count: 6,
force_incremental_marking_finish_count: 0,
total_allocated_objects: 7524,
total_freed_objects: 4375}}
real 0m35.071s
user 0m25.508s
sys 0m15.621s
Updated by bluz71 (Dennis B) 6 days ago
byroot (Jean Boussier) wrote in #note-2:
So I can somewhat repro. On my machine (macOS/M3) about 4.0 is half-way between 3.3 and 3.4.
Thank you byroot. I am most pleased that it is reproducible, especially 3.3 vs 3.4.
Note, if I create a Rails inititializer config/initializer/gc.rb that does GC.disable and run my test case before and after Revision 079ef92b the times do equalize to around 36-38 seconds.
So indeed this regression is related purely to the GC. Thank you for spotting that.
When I tested Ruby 4.0 earlier this week, it was still slower than 3.3 for my Rails seeding operation.
Best regards.
Updated by peterzhu2118 (Peter Zhu) 6 days ago
Thank you for reporting this issue.
I debugged it and have a fix here: https://github.com/ruby/ruby/pull/15882
I distilled the reproduction down to this script:
Script
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "ruby-vips"
gem "benchmark"
gem "image_processing"
gem "shrine"
gem "sqlite3"
gem "rails"
end
require "rails"
require "active_record"
require "benchmark"
require "sqlite3"
require "shrine/storage/file_system"
ActiveRecord::Base.establish_connection(
adapter: "sqlite3",
database: ":memory:"
)
ActiveRecord::Schema.define do
create_table "albums", force: :cascade do |t|
t.integer "artist_id", null: false
t.text "cover_data"
t.datetime "created_at", null: false
t.integer "duration", default: 0, null: false
t.string "title"
t.integer "tracks_count", default: 0, null: false
t.datetime "updated_at", null: false
t.index ["artist_id"], name: "index_albums_on_artist_id"
t.index ["duration"], name: "index_albums_on_duration"
t.index ["title", "artist_id"], name: "index_albums_on_title_and_artist_id", unique: true
end
create_table "artists", force: :cascade do |t|
t.integer "albums_count", default: 0, null: false
t.datetime "created_at", null: false
t.text "description"
t.string "name"
t.datetime "updated_at", null: false
t.string "website"
t.string "wikipedia"
t.index ["description"], name: "index_artists_on_description"
t.index ["name"], name: "index_artists_on_name", unique: true
end
add_foreign_key "albums", "artists"
end
Shrine.storages = {
cache: Shrine::Storage::FileSystem.new("/tmp", prefix: "uploads/cache"),
store: Shrine::Storage::FileSystem.new("/tmp", prefix: "uploads/store"),
}
Shrine.plugin :activerecord
Shrine.plugin :cached_attachment_data
Shrine.plugin :restore_cached_data
Shrine.plugin :derivatives
Shrine.plugin :validation
Shrine.plugin :validation_helpers
Shrine.plugin :determine_mime_type
Shrine.plugin :remote_url, max_size: 1*1024*1024
class CoverUploader < Shrine
Attacher.derivatives do |original|
vips = ImageProcessing::Vips.source(original)
{
small: vips.resize_to_limit!(300, 300),
medium: vips.resize_to_limit!(500, 500)
}
end
end
class ApplicationRecord < ActiveRecord::Base
primary_abstract_class
end
class Album < ApplicationRecord
include CoverUploader::Attachment(:cover)
# ASSOCIATIONS
belongs_to :artist, counter_cache: true, touch: true
# VALIDATIONS
validates :title, presence: true, uniqueness: {scope: :artist_id}
end
class Artist < ApplicationRecord
# ASSOCIATIONS
has_many :albums, dependent: :destroy, strict_loading: true
# VALIDATIONS
validates :name, presence: true, uniqueness: {case_sensitive: false}
# Only allow http:// and https:// URL schemes.
validates :website, format: {
with: URI::DEFAULT_PARSER.make_regexp(%w[http https]),
allow_blank: true
}
end
result = Benchmark.measure do
artist_records = 500.times.map do |i|
{
name: "Artist #{i}",
description: "Description for artist #{i}",
wikipedia: "https://en.wikipedia.org/wiki/Artist_#{i}",
website: "https://artist#{i}.com"
}
end
Artist.insert_all(artist_records)
album_records = 1000.times.map do |i|
{
title: "Album #{i}",
artist_id: (i % 500) + 1,
}
end
Album.insert_all(album_records)
Shrine.storages[:cache].clear!
Shrine.storages[:store].clear!
image_benchmark = Benchmark.measure do
Album.find_each do |album|
cover_path = "/home/peter/Downloads/Cover.jpg"
# Local cover upload.
album.cover = File.open(cover_path)
album.cover_derivatives! # Generate derivatives via libvips.
album.save
end
end
puts "\n=== Image Processing Only ==="
puts "User CPU: #{image_benchmark.utime}s"
puts "System CPU: #{image_benchmark.stime}s"
puts "Real time: #{image_benchmark.real}s"
puts "==============================\n"
Shrine.storages[:cache].clear!
end
puts "\n=== Benchmark Results ==="
puts "User CPU: #{result.utime}s"
puts "System CPU: #{result.stime}s"
puts "Real time: #{result.real}s"
puts "========================\n"
puts GC.stat
pp GC.stat_heap
Before:
{count: 2017, time: 918, marking_time: 559, sweeping_time: 359, heap_allocated_pages: 333, heap_empty_pages: 0, heap_allocatable_slots: 5546, heap_available_slots: 287962, heap_live_slots: 287105, heap_free_slots: 857, heap_final_slots: 0, heap_marked_slots: 269412, heap_eden_pages: 333, total_allocated_pages: 333, total_freed_pages: 0, total_allocated_objects: 3459213, total_freed_objects: 3172108, malloc_increase_bytes: 2259851, malloc_increase_bytes_limit: 16777216, minor_gc_count: 1987, major_gc_count: 30, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 2684, old_objects: 269126, old_objects_limit: 536956, oldmalloc_increase_bytes: 2259851, oldmalloc_increase_bytes_limit: 16777216}
{0 => {slot_size: 40, heap_live_slots: 167011, heap_free_slots: 1639, heap_final_slots: 0, heap_eden_pages: 103, heap_eden_slots: 168650, total_allocated_pages: 119, force_major_gc_count: 3, force_incremental_marking_finish_count: 2, total_allocated_objects: 2412984, total_freed_objects: 2245973},
1 => {slot_size: 80, heap_live_slots: 79897, heap_free_slots: 299, heap_final_slots: 0, heap_eden_pages: 98, heap_eden_slots: 80196, total_allocated_pages: 322, force_major_gc_count: 22, force_incremental_marking_finish_count: 0, total_allocated_objects: 589564, total_freed_objects: 509667},
2 => {slot_size: 160, heap_live_slots: 33658, heap_free_slots: 273, heap_final_slots: 0, heap_eden_pages: 83, heap_eden_slots: 33931, total_allocated_pages: 189, force_major_gc_count: 3, force_incremental_marking_finish_count: 0, total_allocated_objects: 436213, total_freed_objects: 402555},
3 => {slot_size: 320, heap_live_slots: 3306, heap_free_slots: 158, heap_final_slots: 0, heap_eden_pages: 17, heap_eden_slots: 3464, total_allocated_pages: 52, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 15741, total_freed_objects: 12435},
4 => {slot_size: 640, heap_live_slots: 3271, heap_free_slots: 88, heap_final_slots: 0, heap_eden_pages: 33, heap_eden_slots: 3359, total_allocated_pages: 94, force_major_gc_count: 1, force_incremental_marking_finish_count: 0, total_allocated_objects: 4749, total_freed_objects: 1478}}
After:
{count: 2017, time: 541, marking_time: 205, sweeping_time: 336, heap_allocated_pages: 336, heap_empty_pages: 0, heap_allocatable_slots: 18931, heap_available_slots: 290435, heap_live_slots: 287797, heap_free_slots: 2638, heap_final_slots: 0, heap_marked_slots: 270104, heap_eden_pages: 336, total_allocated_pages: 336, total_freed_pages: 0, total_allocated_objects: 3460006, total_freed_objects: 3172209, malloc_increase_bytes: 2259899, malloc_increase_bytes_limit: 16777216, minor_gc_count: 2009, major_gc_count: 8, compact_count: 0, read_barrier_faults: 0, total_moved_objects: 0, remembered_wb_unprotected_objects: 0, remembered_wb_unprotected_objects_limit: 2663, old_objects: 269818, old_objects_limit: 532620, oldmalloc_increase_bytes: 2259899, oldmalloc_increase_bytes_limit: 18971452}
{0 => {slot_size: 40, heap_live_slots: 167111, heap_free_slots: 1545, heap_final_slots: 0, heap_eden_pages: 103, heap_eden_slots: 168656, total_allocated_pages: 126, force_major_gc_count: 3, force_incremental_marking_finish_count: 2, total_allocated_objects: 2413543, total_freed_objects: 2246432},
1 => {slot_size: 80, heap_live_slots: 80426, heap_free_slots: 595, heap_final_slots: 0, heap_eden_pages: 99, heap_eden_slots: 81021, total_allocated_pages: 360, force_major_gc_count: 2, force_incremental_marking_finish_count: 0, total_allocated_objects: 590103, total_freed_objects: 509677},
2 => {slot_size: 160, heap_live_slots: 33750, heap_free_slots: 183, heap_final_slots: 0, heap_eden_pages: 83, heap_eden_slots: 33933, total_allocated_pages: 200, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 436028, total_freed_objects: 402278},
3 => {slot_size: 320, heap_live_slots: 3254, heap_free_slots: 5, heap_final_slots: 0, heap_eden_pages: 16, heap_eden_slots: 3259, total_allocated_pages: 33, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 15623, total_freed_objects: 12369},
4 => {slot_size: 640, heap_live_slots: 3294, heap_free_slots: 272, heap_final_slots: 0, heap_eden_pages: 35, heap_eden_slots: 3566, total_allocated_pages: 47, force_major_gc_count: 0, force_incremental_marking_finish_count: 0, total_allocated_objects: 4747, total_freed_objects: 1453}}
Note that major GC count has reduced significantly, from 30 to 8.
Updated by bluz71 (Dennis B) 5 days ago
peterzhu2118 (Peter Zhu) wrote in #note-4:
Thank you for reporting this issue.
I am glad to help, even in this small way.
I debugged it and have a fix here: https://github.com/ruby/ruby/pull/15882
That is great news.
I look forward to seeing this merged and eventually released, hopefully in a 4.0.X release.
Thank you very much Peter & byroot.
Updated by byroot (Jean Boussier) 5 days ago
- Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: REQUIRED, 4.0: REQUIRED
Updated by peterzhu2118 (Peter Zhu) 4 days ago
- Status changed from Open to Closed