Project

General

Profile

Actions

Feature #8948

open

Frozen regex

Added by sawa (Tsuyoshi Sawada) over 10 years ago. Updated about 1 year ago.

Status:
Assigned
Target version:
-
[ruby-core:57353]

Description

=begin
I see that frozen string was accepted for Ruby 2.1, and frozen array and hash are proposed in https://bugs.ruby-lang.org/issues/8909. I feel there is even more use case for a frozen regex, i.e., a regex literal that generates a regex only once. It is frequent to have a regex within a frequently repeated portion of code, and generating the same regex each time is a waste of resource. At the moment, we can have a code like:

class Foo
  RE1 = /pattern1/
  RE2 = /pattern1/
  RE3 = /pattern1/
  def classify
    case self
    when RE1 then 1
    when RE2 then 2
    when RE3 then 3
    else 4
    end
  end
end

but suppose we have a frozen Regexp literal //f. Then we can write like:

class Foo
  def classify
    case self
    when /pattern1/f then 1
    when /pattern1/f then 2
    when /pattern1/f then 3
    else 4
    end
  end
end

=end


Related issues 3 (0 open3 closed)

Related to Ruby master - Misc #17412: Regexp vs Range: freezing differencesClosedActions
Related to Ruby master - Feature #17256: Freeze all Regexp objectsClosedmatz (Yukihiro Matsumoto)Actions
Has duplicate Ruby master - Feature #16377: Regexp literals should be frozenClosedmame (Yusuke Endoh)Actions

Updated by sawa (Tsuyoshi Sawada) over 10 years ago

Sorry, there was a mistake in the above. The three regexes with the same content /pattern1/ (or /pattern1/f) in the respective examples are supposed to represent different patterns.

Updated by Eregon (Benoit Daloze) over 10 years ago

We already have immutable (created only once) regexps: it is always the case for literal regexps and for dynamic regexps you need the 'o' flag: /a#{2}b/o.

So there are in practice immutable, but currently not #frozen?. Do you want to request it? I think it makes sense.

You can check with #object_id to know if 2 references reference the same object.

def r; /ab/; end
r.object_id

=> 2160323760

r.object_id
=> 2160323760

def s; /a#{2}b/; end
s.object_id
=> 2153197860
s.object_id
=> 2160163740

def t; /a#{2}b/o; end
t.object_id
=> 2160181200
t.object_id
=> 2160181200

Updated by sawa (Tsuyoshi Sawada) over 10 years ago

Eregon, thank you for the information.

Updated by Eregon (Benoit Daloze) over 10 years ago

  • Status changed from Open to Feedback

sawa: do you want to request Regexp to always be #frozen? or should the issue be closed?

Updated by jwille (Jens Wille) over 10 years ago

besides regexps being frozen, there might still be a use case for regexp literals that would only be allocated once:

def r1; /ab/; end; r1.object_id  #=> 70043421664620
def r2; /ab/; end; r2.object_id  #=> 70043421398060

def r3; /ab/f; end; r3.object_id  #=> 70043421033140
def r4; /ab/f; end; r4.object_id  #=> 70043421033140

i think it's in the same vein as #8579 and #8909.

Updated by sawa (Tsuyoshi Sawada) over 10 years ago

jwille, I agree with the use case, but it would be difficult to tell which regexes are intended to be the same, so I would not request that feature.

Probably, it makes sense to have all static regexes frozen, and have the f flag freeze dynamic regexes as well. I can't think of a use case for a regex that is immutable but not frozen. I am actually not clear about the difference.

Updated by jwille (Jens Wille) over 10 years ago

but it would be difficult to tell which regexes are intended to be the same

i'm not sure i understand. how is

def r1; /ab/f; end
def r2; /ab/f; end

different from

def s1; 'ab'f; end
def s2; 'ab'f; end

?

Updated by sawa (Tsuyoshi Sawada) over 10 years ago

jwille,

My understanding with the case of string in your example is that the two strings would count as different strings, but for respective method calls would not create new strings. It would mean one of the string can be "ab" and the other a different string such as "cd".

If that is what you intended for your regex examples, then there is no difference.

Updated by ko1 (Koichi Sasada) over 10 years ago

  • Category set to syntax
  • Assignee set to matz (Yukihiro Matsumoto)
  • Target version set to 2.6

I like to freeze normal regexp literal that Eregon said.

2.2 matter?

Anyone set instance variable for each regexp? :)

Updated by Eregon (Benoit Daloze) over 10 years ago

ko1 (Koichi Sasada) wrote:

2.2 matter?

2.1 would make sense to me, so it goes along with other frozen literals.

Anyone set instance variable for each regexp? :)

I highly doubt it.

Actions #11

Updated by ko1 (Koichi Sasada) over 8 years ago

  • Status changed from Feedback to Assigned

There are two options:

  1. Freeze only literal regexps
  2. Freeze all of regexps

I like (2) because I have no idea to change regexp objects.

History of "Frozen":

Actions #12

Updated by ko1 (Koichi Sasada) over 8 years ago

A patch is small.

Index: re.c
===================================================================
--- re.c	(revision 51650)
+++ re.c	(working copy)
@@ -2548,6 +2548,8 @@
     if (!re->ptr) return -1;
     RB_OBJ_WRITE(obj, &re->src, rb_fstring(rb_enc_str_new(s, len, enc)));
     RB_GC_GUARD(unescaped);
+
+    rb_obj_freeze(obj);
     return 0;
 }
 

But I got many failures on rubyspec.
https://gist.github.com/ko1/da52575de115c928ce4a

Updated by ko1 (Koichi Sasada) over 6 years ago

Now we can't run make test-all because test/lib/test/unit.rb defines singleton method for a regexp object.

      def non_options(files, options)
        filter = options[:filter]
        if filter
          pos_pat = /\A\/(.*)\/\z/
          neg_pat = /\A!\/(.*)\/\z/
          negative, positive = filter.partition {|s| neg_pat =~ s}
          if positive.empty?
            filter = nil
          elsif negative.empty? and positive.size == 1 and pos_pat !~ positive[0]
            filter = positive[0]
          else
            filter = Regexp.union(*positive.map! {|s| Regexp.new(s[pos_pat, 1] || "\\A#{Regexp.quote(s)}\\z")})
          end
          unless negative.empty?
            negative = Regexp.union(*negative.map! {|s| Regexp.new(s[neg_pat, 1])})
            filter = /\A(?=.*#{filter})(?!.*#{negative})/
          end
          if Regexp === filter
            # bypass conversion in minitest
            def filter.=~(other)    # :nodoc:   <-- singleton method!!
              super unless Regexp === other
            end
          end
          options[:filter] = filter
        end
        true
      end

Is it acceptable (should we allow such code)?

Updated by matz (Yukihiro Matsumoto) over 6 years ago

I agree with frozen regexp literals. If you want to freeze all regexp objects, you need more info to persuade me.

Matz.

Updated by naruse (Yui NARUSE) over 6 years ago

This change will break compatibility.
If the change has clear benefit, such incompatibility is acceptable.
But this benefit seems not so much.

If so this should be postponed on introducing parallelization.

Updated by Eregon (Benoit Daloze) over 6 years ago

matz (Yukihiro Matsumoto) wrote:

I agree with frozen regexp literals. If you want to freeze all regexp objects, you need more info to persuade me.

Matz.

I think the inconsistency between Regexp being freezed by default or not (based on being literals) would be confusing.
I believe it would be better all Regexp frozen or none frozen, but not something in between.

Updated by shyouhei (Shyouhei Urabe) over 6 years ago

We had a discussion around this issue at yesterday's developer meeting.

Seems everyone there agreed that in long term future, regex shall be frozen. However the status quo is that there are wild applications that do override regex methods, particularly for testing purposes. So there needs to be some transition paths.

Freezing regex literals sounded a good starting point. We already optimize the return value of them:

3.times {
  p(/foo/.object_id)
}

The merit of freezing regex literals are:

  • Seems very safe to do. If it hurts someone, that means that person has already been bothered by the literal being shared among evaluations.

The disadvantage of freezing regex literals are:

  • It benefits no one while it does confuse people. Regexp class has no built-in destructive methods so freezing them does almost nothing; just prohibits method redefinitions. Is it worth doing by deafult?

Updated by Eregon (Benoit Daloze) over 6 years ago

Regexp class has no built-in destructive methods so freezing them does almost nothing; just prohibits method redefinitions.

I think it is useful in communicating this is a immutable object.
So singleton methods definitions are prevented (it has to behave like every other Regexp), but methods can be added to the Regexp class.
Adding instances variables to the Regexp are also prevented, which seems a good idea as attaching state to immutable object makes no sense.

I agree with the rationale that it seems safer to first only freeze Regexp literals.
Does /#{23}/ counts as literal though? That expression produces different instances, unlike the /#{23}/o variant.
I think it should be considered as literal and be frozen.
Then the distinction is between literals and Regexp.new, which is much clearer than "literals except those with interpolation but without /o".

Actions #19

Updated by shyouhei (Shyouhei Urabe) over 4 years ago

  • Has duplicate Feature #16377: Regexp literals should be frozen added
Actions #20

Updated by byroot (Jean Boussier) about 4 years ago

  • Status changed from Assigned to Closed

Applied in changeset git|98ef38ada43338c073f50a0093196f0356284625.


Freeze Regexp literals

[Feature #8948] [Feature #16377]

Since Regexp literals always reference the same instance,
allowing to mutate them can lead to state leak.

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Status changed from Closed to Assigned

If so this should be postponed on introducing parallelization.

Reconsider about it? (freeze all regexp)

Updated by ko1 (Koichi Sasada) over 3 years ago

I think at least interpolated regexp (/a#{i}b/) should be frozen though, like Range objects.

Actions #23

Updated by Eregon (Benoit Daloze) about 3 years ago

  • Related to Misc #17412: Regexp vs Range: freezing differences added

Updated by Eregon (Benoit Daloze) about 3 years ago

@matz (Yukihiro Matsumoto) Is it OK to experiment freezing all Regexp objects for Ruby 3.1?

There is some discussion in #17412 too.

Actions #25

Updated by Eregon (Benoit Daloze) about 1 year ago

Updated by Eregon (Benoit Daloze) about 1 year ago

TruffleRuby has all Regexp objects frozen since a while, it would be great to do the same in CRuby for consistency.

Updated by Eregon (Benoit Daloze) about 1 year ago

  • Assignee changed from matz (Yukihiro Matsumoto) to Eregon (Benoit Daloze)

I'll try it when I get some time.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0