Project

General

Profile

Actions

Bug #21856

open

Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Bug #21856: Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Added by ahorek (Pavel Rosický) about 1 month ago. Updated 1 day ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux]
[ruby-core:124655]

Description

Loofah sanitization is noticeably slower

Ruby:     3.4.8
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     26.091872   0.000000  26.091872 ( 25.110925)
Loofah.scrub_fragment(:prune)        25.913185   0.010392  25.923577 ( 24.948464)
Nokogiri HTML parse only              3.852690   0.000000   3.852690 (  3.705930)

Ruby:     4.0.0 & 4.0.1
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     38.094207   0.041753  38.135960 ( 36.669463)
Loofah.scrub_fragment(:prune)        40.168795   0.000045  40.168840 ( 38.561806)
Nokogiri HTML parse only              4.012936   0.052024   4.064960 (  3.913272)


Ruby:     4.1.0 (ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux])
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     39.004228   0.000000  39.004228 ( 37.694873)
Loofah.scrub_fragment(:prune)        39.043199   0.031284  39.074483 ( 37.182785)
Nokogiri HTML parse only              3.889100   0.010427   3.899527 (  3.741622)

Originally reported https://www.redmine.org/issues/43737


Files

benchmark.rb (1002 Bytes) benchmark.rb ahorek (Pavel Rosický), 02/01/2026 08:10 PM

Updated by byroot (Jean Boussier) about 1 month ago Actions #1 [ruby-core:124656]

I'm able to repro on my machine, even though the different isn't quite as bad (more like 30% slower).

Profile of ruby 3.4.7: https://share.firefox.dev/4rw3mv0
Profile of ruby 4.0.0: https://share.firefox.dev/4rtvrmt

The striking difference on the profile seem to be that 4.0 spends 28% of its time in remove_class_from_subclasses -> rb_classext_free_subclasses -> rb_iclass_classext_free -> rb_classext_foreach -> rb_obj_free.

A few notes:

  • This codepath was changed a lot with the Ruby::Box introduction, it may have become significantly slower.
  • It's surprising that we're sweeping lots of Class object, perhaps Loofah or Nokogiri are inadvertently allocating singleton classes in a hot spot?

Updated by byroot (Jean Boussier) about 1 month ago Actions #2 [ruby-core:124657]

I reduced the benchmark to:

# frozen_string_literal: true

require "bundler/inline"

gemfile do
  source 'https://rubygems.org'
  gem "benchmark-ips"
end

Benchmark.ips do |x|
  x.report("singleton") do
    Object.new.singleton_class
  end
end

3.4.7:

ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton   742.338k i/100ms
Calculating -------------------------------------
           singleton      7.381M (± 2.2%) i/s  (135.48 ns/i) -     37.117M in   5.031106s

4.0.0

ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton    13.919k i/100ms
Calculating -------------------------------------
           singleton    146.202k (±28.4%) i/s    (6.84 μs/i) -    668.112k in   5.059563s

So that's a pretty massive regression in class sweeping. I'll see what I can do.

Updated by byroot (Jean Boussier) about 1 month ago Actions #3 [ruby-core:124659]

So the regression is indeed a consequence of the Box introduction.

When sweeping a Class, we need to remove the backreference from rb_classext_struct.box_super_subclasses and rb_classext_struct.box_module_subclasses, and for each one in involve multiple st_table lookups and updates, which is way more work that we used to have to do.

There might be a way to optimize this, but my understanding of how boxes are supposed to work is limited, so I don't know if I can fix it without breaking boxes.

Here again the solution might be to have a fast path for the overwhelming majority of classes that aren't impacted by boxes, but it would make the code way more complex.

Updated by byroot (Jean Boussier) about 1 month ago Actions #4

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED, 4.0: REQUIRED

Updated by byroot (Jean Boussier) about 1 month ago Actions #5 [ruby-core:124662]

  • Subject changed from Nokogiri performance degradation since Ruby 4.0 to Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

I spent some time trying to fix this, I think it's possible but is a pretty major refactoring.

In 3.4:

Classes have a subclasses doubly-linked list, which is necessary to be able to iterate subclasses efficiently.
As to be able to purge these list effectively, each class also keep a direct reference to the node than contain themselves in the parent linked list (subclass_entry).

They also have another linked list with all the module its been included on.

All this allows to efficiently remove all the references to a given class.

In 4.0:

It's roughly the same, except the 3 references above are all behind an extra st_table indirection. So before you can access any of these lists, you need to do an extra hash lookup.

To be very honest I don't understand why it is necessary, given these lists are inside rb_classext_t and from my understanding classes have one rb_classext_t per box, so that indirection seem redundant to me.

But then again, I don't understand the box design well, so I may be overlooking something, and I don't know if that's something I can reasonably fix.

cc @tagomoris (Satoshi Tagomori) @ko1 (Koichi Sasada)

Updated by jhawthorn (John Hawthorn) 1 day ago · Edited Actions #6 [ruby-core:124975]

I think I've found a solution to this: we can return to the Ruby 3.4 O(1) removal, remove box/namespacing from it, and actually make it even simpler by skipping the CLASS -> ICLASS (and ICLASS -> ICLASS) relationship and directly associating T_CLASS with it's true T_CLASS "superclass".


Currently we maintain the subclasses list for two separate purposes (we essentially have two different relationships we're putting into the same list):

  1. On a T_MODULE, we track the T_ICLASSes created to include it into other classes. Used for method invalidation and propagating includes on the module that happen after it's been used
  2. On a T_CLASS/T_ICLASS, we track the T_CLASS/T_ICLASS which are the immediate children of the class. We use this for method invalidation, some cvar things, and to iterate through subclasses.

Purpose 1 does not have any issues with box, the T_ICLASS always belongs to one specific module and that's immutable. This list can be box-global (always use the prime classext or hoist it out) and only needs to be pruned during free. If we care about behaviour under a particular box (ie. the propagating includes), we should look up the current box being modified on the ICLASS itself.

Purpose 2 is more complicated. It currently tracks the immediate children, the T_CLASS or T_ICLASS whose super points back. Because super is per-box and is mutable (include/prepend insert ICLASSes into the chain) we need to update the list on include/prepend, entries must be per-box, and we can have multiple entries per-box. I propose we simplify this by no longer tracking the immediate subclass, but instead tracking the T_CLASS -> ... -> T_CLASS relationship, ie. the inverse of rb_class_superclass. That relationship is the same across all boxes and immutable after Class creation.

As a special case the ICLASS for refinements are also added to the purpose 2 list (on T_CLASS). As those ICLASS do not chain to an eventual leaf T_CLASS.

When we need to find the classes which have included a module, we can use the module subclasses list to find the ICLASS and then use RCLASS_INCLUDER. If we needed to iterate all T_ICLASS, we could then walk up the CLASS_SUPER chain, but I didn't find anywhere we needed to do that.


https://github.com/ruby/ruby/pull/16363

Actions

Also available in: PDF Atom