Project

General

Profile

Actions

Bug #21856

closed

Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Bug #21856: Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

Added by ahorek (Pavel Rosický) 2 months ago. Updated 15 days ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux]
[ruby-core:124655]

Description

Loofah sanitization is noticeably slower

Ruby:     3.4.8
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     26.091872   0.000000  26.091872 ( 25.110925)
Loofah.scrub_fragment(:prune)        25.913185   0.010392  25.923577 ( 24.948464)
Nokogiri HTML parse only              3.852690   0.000000   3.852690 (  3.705930)

Ruby:     4.0.0 & 4.0.1
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     38.094207   0.041753  38.135960 ( 36.669463)
Loofah.scrub_fragment(:prune)        40.168795   0.000045  40.168840 ( 38.561806)
Nokogiri HTML parse only              4.012936   0.052024   4.064960 (  3.913272)


Ruby:     4.1.0 (ruby 4.1.0dev (2026-01-31T09:41:30Z master 7ef8c470d2) +PRISM [x86_64-linux])
Loofah:   2.25.0
Nokogiri: 1.19.0
Iterations: 100000

                                          user     system      total        real
Loofah.fragment + scrub!(:prune)     39.004228   0.000000  39.004228 ( 37.694873)
Loofah.scrub_fragment(:prune)        39.043199   0.031284  39.074483 ( 37.182785)
Nokogiri HTML parse only              3.889100   0.010427   3.899527 (  3.741622)

Originally reported https://www.redmine.org/issues/43737


Files

benchmark.rb (1002 Bytes) benchmark.rb ahorek (Pavel Rosický), 02/01/2026 08:10 PM

Updated by byroot (Jean Boussier) about 2 months ago Actions #1 [ruby-core:124656]

I'm able to repro on my machine, even though the different isn't quite as bad (more like 30% slower).

Profile of ruby 3.4.7: https://share.firefox.dev/4rw3mv0
Profile of ruby 4.0.0: https://share.firefox.dev/4rtvrmt

The striking difference on the profile seem to be that 4.0 spends 28% of its time in remove_class_from_subclasses -> rb_classext_free_subclasses -> rb_iclass_classext_free -> rb_classext_foreach -> rb_obj_free.

A few notes:

  • This codepath was changed a lot with the Ruby::Box introduction, it may have become significantly slower.
  • It's surprising that we're sweeping lots of Class object, perhaps Loofah or Nokogiri are inadvertently allocating singleton classes in a hot spot?

Updated by byroot (Jean Boussier) about 2 months ago Actions #2 [ruby-core:124657]

I reduced the benchmark to:

# frozen_string_literal: true

require "bundler/inline"

gemfile do
  source 'https://rubygems.org'
  gem "benchmark-ips"
end

Benchmark.ips do |x|
  x.report("singleton") do
    Object.new.singleton_class
  end
end

3.4.7:

ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton   742.338k i/100ms
Calculating -------------------------------------
           singleton      7.381M (± 2.2%) i/s  (135.48 ns/i) -     37.117M in   5.031106s

4.0.0

ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
Warming up --------------------------------------
           singleton    13.919k i/100ms
Calculating -------------------------------------
           singleton    146.202k (±28.4%) i/s    (6.84 μs/i) -    668.112k in   5.059563s

So that's a pretty massive regression in class sweeping. I'll see what I can do.

Updated by byroot (Jean Boussier) about 2 months ago Actions #3 [ruby-core:124659]

So the regression is indeed a consequence of the Box introduction.

When sweeping a Class, we need to remove the backreference from rb_classext_struct.box_super_subclasses and rb_classext_struct.box_module_subclasses, and for each one in involve multiple st_table lookups and updates, which is way more work that we used to have to do.

There might be a way to optimize this, but my understanding of how boxes are supposed to work is limited, so I don't know if I can fix it without breaking boxes.

Here again the solution might be to have a fast path for the overwhelming majority of classes that aren't impacted by boxes, but it would make the code way more complex.

Updated by byroot (Jean Boussier) about 2 months ago Actions #4

  • Backport changed from 3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN, 4.0: UNKNOWN to 3.2: DONTNEED, 3.3: DONTNEED, 3.4: DONTNEED, 4.0: REQUIRED

Updated by byroot (Jean Boussier) about 2 months ago Actions #5 [ruby-core:124662]

  • Subject changed from Nokogiri performance degradation since Ruby 4.0 to Massive performance degradation of `rb_obj_free` for `T_CLASS` since Ruby 4.0

I spent some time trying to fix this, I think it's possible but is a pretty major refactoring.

In 3.4:

Classes have a subclasses doubly-linked list, which is necessary to be able to iterate subclasses efficiently.
As to be able to purge these list effectively, each class also keep a direct reference to the node than contain themselves in the parent linked list (subclass_entry).

They also have another linked list with all the module its been included on.

All this allows to efficiently remove all the references to a given class.

In 4.0:

It's roughly the same, except the 3 references above are all behind an extra st_table indirection. So before you can access any of these lists, you need to do an extra hash lookup.

To be very honest I don't understand why it is necessary, given these lists are inside rb_classext_t and from my understanding classes have one rb_classext_t per box, so that indirection seem redundant to me.

But then again, I don't understand the box design well, so I may be overlooking something, and I don't know if that's something I can reasonably fix.

cc @tagomoris (Satoshi Tagomori) @ko1 (Koichi Sasada)

Updated by jhawthorn (John Hawthorn) 23 days ago · Edited Actions #6 [ruby-core:124975]

I think I've found a solution to this: we can return to the Ruby 3.4 O(1) removal, remove box/namespacing from it, and actually make it even simpler by skipping the CLASS -> ICLASS (and ICLASS -> ICLASS) relationship and directly associating T_CLASS with it's true T_CLASS "superclass".


Currently we maintain the subclasses list for two separate purposes (we essentially have two different relationships we're putting into the same list):

  1. On a T_MODULE, we track the T_ICLASSes created to include it into other classes. Used for method invalidation and propagating includes on the module that happen after it's been used
  2. On a T_CLASS/T_ICLASS, we track the T_CLASS/T_ICLASS which are the immediate children of the class. We use this for method invalidation, some cvar things, and to iterate through subclasses.

Purpose 1 does not have any issues with box, the T_ICLASS always belongs to one specific module and that's immutable. This list can be box-global (always use the prime classext or hoist it out) and only needs to be pruned during free. If we care about behaviour under a particular box (ie. the propagating includes), we should look up the current box being modified on the ICLASS itself.

Purpose 2 is more complicated. It currently tracks the immediate children, the T_CLASS or T_ICLASS whose super points back. Because super is per-box and is mutable (include/prepend insert ICLASSes into the chain) we need to update the list on include/prepend, entries must be per-box, and we can have multiple entries per-box. I propose we simplify this by no longer tracking the immediate subclass, but instead tracking the T_CLASS -> ... -> T_CLASS relationship, ie. the inverse of rb_class_superclass. That relationship is the same across all boxes and immutable after Class creation.

As a special case the ICLASS for refinements are also added to the purpose 2 list (on T_CLASS). As those ICLASS do not chain to an eventual leaf T_CLASS.

When we need to find the classes which have included a module, we can use the module subclasses list to find the ICLASS and then use RCLASS_INCLUDER. If we needed to iterate all T_ICLASS, we could then walk up the CLASS_SUPER chain, but I didn't find anywhere we needed to do that.


https://github.com/ruby/ruby/pull/16363

Updated by mar21 (Martin Halfar) 16 days ago Actions #7 [ruby-core:125055]

byroot (Jean Boussier) wrote in #note-1:

I'm able to repro on my machine, even though the different isn't quite as bad (more like 30% slower).

Profile of ruby 3.4.7: https://share.firefox.dev/4rw3mv0
Profile of ruby 4.0.0: https://share.firefox.dev/4rtvrmt

The striking difference on the profile seem to be that 4.0 spends 28% of its time in remove_class_from_subclasses -> rb_classext_free_subclasses -> rb_iclass_classext_free -> rb_classext_foreach -> rb_obj_free.

A few notes:

  • This codepath was changed a lot with the Ruby::Box introduction, it may have become significantly slower.
  • It's surprising that we're sweeping lots of Class object, perhaps Loofah or Nokogiri are inadvertently allocating singleton classes in a hot spot?

We have just encountered this issue ourselves and while it is indeed not quite bad on smaller samples, it quickly scales up with larger (to huge) samples (in our case 2.5MB html document).

  • On Ruby 3.4.9 and below it took <1s to sanitize.
  • On Ruby 4.0.X immediately after startup and/or GC cleanup ~2s. Second run varies 40s-100s. Third run couple of minutes.

Updated by mdalessio (Mike Dalessio) 16 days ago 1Actions #8 [ruby-core:125057]

👋 I'm the maintainer of Nokogiri and Loofah.

Nokogiri has a "decorator" feature that allows modules to be mixed into node objects dynamically. When a document has decorators registered, Nokogiri calls node.extend(mod) on each Node object when it is created. This creates a singleton class per Node (though not necessarily for every DOM node -- only the ones for which we create a Ruby object).

Loofah uses this feature. It defines a DocumentDecorator module included in all of its document classes (XML, HTML4, HTML5), which registers ScrubBehavior::Node as a decorator for every Nokogiri::XML::Node. Loofah traverses the DOM tree and instantiates a Ruby Node object for every DOM node. This means every node in every Loofah-parsed document gets a singleton class allocated — one per node.

Given the fact that Rails uses Loofah to sanitize content, this performance problem is likely to compound badly for Rails applications when sanitizing large HTML pages.

Updated by byroot (Jean Boussier) 16 days ago Actions #9 [ruby-core:125058]

Nokogiri has a "decorator" feature

I figured as much when I investigated the problem. Now that John found a fix, it will probably be back to OK performance in 4.0.3, but I think it would be valuable to find a better API in Nokogiri. In theory it should be possible to provide a subclass of Nokogiri::Node for nokogiri to allocate nodes with, instead of a module to extend the instances with.

That would offer an almost as flexible feature but with much better performance.

Updated by jhawthorn (John Hawthorn) 16 days ago 1Actions #10 [ruby-core:125060]

  • Status changed from Open to Closed

I merged my proposed design, which should restore Ruby 3.4's performance https://github.com/ruby/ruby/pull/16363 (forgot to tag this issue in the commit message, sorry)

I'm going to look into backporting this to Ruby 4.0. We usually don't backport performance improvements, but I think we should make an exception.

Updated by byroot (Jean Boussier) 16 days ago Actions #11 [ruby-core:125066]

We usually don't backport performance improvements, but I think we should make an exception.

IMO the performance degradation is severe enough that I consider it a bug, and that is why I marked it for backport. So I don't really see this as an exception.

Updated by nonster (Nony Dutton) 15 days ago Actions #12 [ruby-core:125073]

byroot (Jean Boussier) wrote in #note-11:

We usually don't backport performance improvements, but I think we should make an exception.

IMO the performance degradation is severe enough that I consider it a bug, and that is why I marked it for backport. So I don't really see this as an exception.

This bit us so we downgraded back to Ruby 3.4. Thanks everyone for the fix!

Actions

Also available in: PDF Atom