Feature #16557

Deduplicate Regexp literals

Added by byroot (Jean Boussier) 25 days ago. Updated 25 days ago.

Target version:


Pull Request:


Real world application contain many duplicated Regexp literals.

From a rails/console in Redmine:

>> ObjectSpace.each_object(Regexp).count
=> 6828
>> ObjectSpace.each_object(Regexp).uniq.count
=> 4162
>> ObjectSpace.each_object(Regexp) { |r| ObjectSpace.memsize_of(r) }.sum
=> 4611957 # 4.4 MB total
>> ObjectSpace.each_object(Regexp) { |r| ObjectSpace.memsize_of(r) }.sum - ObjectSpace.each_object(Regexp) { |r| ObjectSpace.memsize_of(r) }.sum
=> 1490601 # 1.42 MB could be saved

Here's the to 10 most duplicated regexps in Redmine:

147: /"/
107: /\s+/
103: //
89: /\n/
83: /'/
76: /\s+/m
37: /\d+/
35: /\[/
33: /./
33: /\\./

Any empty Rails application will have a similar amount of regexps.

The feature

Since made literal regexps frozen, it is possible to deduplicate literal regexps without changing any semantic and save a decent amount of resident memory.

The patch

I tried implementing this feature in a way very similar to the frozen_strings table, it's functional but I'm having trouble with a segfault on Linux:


Updated by Eregon (Benoit Daloze) 25 days ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) 25 days ago

This is quite interesting, and would also avoid compiling these duplicated Regexp again, which likely saves quite a bit of startup time.

Updated by byroot (Jean Boussier) 25 days ago

Eregon (Benoit Daloze) wrote:

would also avoid compiling these duplicated Regexp again.

In theory yes, however my current patch doesn't go that far for simplicity's sake. However that would indeed be a nice followup or improvement.

Also available in: Atom PDF