Project

General

Profile

Actions

Bug #21856

open

Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Bug #21856: Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Added by ahorek (Pavel Rosický) 18 days ago. Updated 18 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux]
[ruby-core:124655]

Description

Loofah sanitization is noticeably slower

Ruby:     3.4.8
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     26.091872   0.000000  26.091872 ( 25.110925)
Loofah.scrub_fragment(:prune)        25.913185   0.010392  25.923577 ( 24.948464)
Nokogiri HTML parse only              3.852690   0.000000   3.852690 (  3.705930)

Ruby:     4.0.0 & 4.0.1
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     38.094207   0.041753  38.135960 ( 36.669463)
Loofah.scrub_fragment(:prune)        40.168795   0.000045  40.168840 ( 38.561806)
Nokogiri HTML parse only              4.012936   0.052024   4.064960 (  3.913272)


Ruby:     4.1.0 (ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux])
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     39.004228   0.000000  39.004228 ( 37.694873)
Loofah.scrub_fragment(:prune)        39.043199   0.031284  39.074483 ( 37.182785)
Nokogiri HTML parse only              3.889100   0.010427   3.899527 (  3.741622)

Originally reported https://www.redmine.org/issues/43737


Files

benchmark.rb (1002 Bytes) benchmark.rb ahorek (Pavel Rosický), 02/01/2026 08:10 PM

Updated by byroot (Jean Boussier) 18 days ago Actions #1 [ruby-core:124656]

I'm able to repro on my machine, even though the different isn't quite as bad (more like 30% slower).

Profile of ruby 3.4.7: https://share.firefox.dev/4rw3mv0
Profile of ruby 4.0.0: https://share.firefox.dev/4rtvrmt

The striking difference on the profile seem to be that 4.0 spends 28% of its time in remove_class_from_subclasses -> rb_classext_free_subclasses -> rb_iclass_classext_free -> rb_classext_foreach -> rb_obj_free.

A few notes:

  • This codepath was changed a lot with the Ruby::Box introduction, it may have become significantly slower.
  • It's surprising that we're sweeping lots of Class object, perhaps Loofah or Nokogiri are inadvertently allocating singleton classes in a hot spot?

Updated by byroot (Jean Boussier) 18 days ago Actions #2 [ruby-core:124657]

I reduced the benchmark to:

# frozen_string_literal: true

require "bundler/inline"

gemfile do
  source 'https://rubygems.org'
  gem "benchmark-ips"
end

Benchmark.ips do |x|
  x.report("singleton") do
    Object.new.singleton_class
  end
end

3.4.7:

ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton   742.338k i/100ms
Calculating -------------------------------------
           singleton      7.381M (± 2.2%) i/s  (135.48 ns/i) -     37.117M in   5.031106s

4.0.0

ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton    13.919k i/100ms
Calculating -------------------------------------
           singleton    146.202k (±28.4%) i/s    (6.84 μs/i) -    668.112k in   5.059563s

So that's a pretty massive regression in class sweeping. I'll see what I can do.

Updated by byroot (Jean Boussier) 18 days ago Actions #3 [ruby-core:124659]

So the regression is indeed a consequence of the Box introduction.

When sweeping a Class, we need to remove the backreference from rb_classext_struct.box_super_subclasses and rb_classext_struct.box_module_subclasses, and for each one in involve multiple st_table lookups and updates, which is way more work that we used to have to do.

There might be a way to optimize this, but my understanding of how boxes are supposed to work is limited, so I don't know if I can fix it without breaking boxes.

Here again the solution might be to have a fast path for the overwhelming majority of classes that aren't impacted by boxes, but it would make the code way more complex.

Updated by byroot (Jean Boussier) 18 days ago Actions #4

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED, 4.0: REQUIRED

Updated by byroot (Jean Boussier) 18 days ago Actions #5 [ruby-core:124662]

  • Subject changed from Nokogiri performance degradation since Ruby 4.0 to Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

I spent some time trying to fix this, I think it's possible but is a pretty major refactoring.

In 3.4:

Classes have a subclasses doubly-linked list, which is necessary to be able to iterate subclasses efficiently.
As to be able to purge these list effectively, each class also keep a direct reference to the node than contain themselves in the parent linked list (subclass_entry).

They also have another linked list with all the module its been included on.

All this allows to efficiently remove all the references to a given class.

In 4.0:

It's roughly the same, except the 3 references above are all behind an extra st_table indirection. So before you can access any of these lists, you need to do an extra hash lookup.

To be very honest I don't understand why it is necessary, given these lists are inside rb_classext_t and from my understanding classes have one rb_classext_t per box, so that indirection seem redundant to me.

But then again, I don't understand the box design well, so I may be overlooking something, and I don't know if that's something I can reasonably fix.

cc @tagomoris (Satoshi Tagomori) @ko1 (Koichi Sasada)

Actions

Also available in: PDF Atom