Feature #8948
openFrozen regex
Added by sawa (Tsuyoshi Sawada) about 11 years ago. Updated over 1 year ago.
Description
=begin
I see that frozen string was accepted for Ruby 2.1, and frozen array and hash are proposed in https://bugs.ruby-lang.org/issues/8909. I feel there is even more use case for a frozen regex, i.e., a regex literal that generates a regex only once. It is frequent to have a regex within a frequently repeated portion of code, and generating the same regex each time is a waste of resource. At the moment, we can have a code like:
class Foo
RE1 = /pattern1/
RE2 = /pattern1/
RE3 = /pattern1/
def classify
case self
when RE1 then 1
when RE2 then 2
when RE3 then 3
else 4
end
end
end
but suppose we have a frozen Regexp
literal //f
. Then we can write like:
class Foo
def classify
case self
when /pattern1/f then 1
when /pattern1/f then 2
when /pattern1/f then 3
else 4
end
end
end
=end
Updated by sawa (Tsuyoshi Sawada) about 11 years ago
Sorry, there was a mistake in the above. The three regexes with the same content /pattern1/
(or /pattern1/f
) in the respective examples are supposed to represent different patterns.
Updated by Eregon (Benoit Daloze) about 11 years ago
We already have immutable (created only once) regexps: it is always the case for literal regexps and for dynamic regexps you need the 'o' flag: /a#{2}b/o
.
So there are in practice immutable, but currently not #frozen?
. Do you want to request it? I think it makes sense.
You can check with #object_id
to know if 2 references reference the same object.
def r; /ab/; end r.object_id
=> 2160323760
r.object_id
=> 2160323760
def s; /a#{2}b/; end
s.object_id
=> 2153197860
s.object_id
=> 2160163740
def t; /a#{2}b/o; end
t.object_id
=> 2160181200
t.object_id
=> 2160181200
Updated by sawa (Tsuyoshi Sawada) about 11 years ago
Eregon, thank you for the information.
Updated by Eregon (Benoit Daloze) about 11 years ago
- Status changed from Open to Feedback
sawa: do you want to request Regexp
to always be #frozen?
or should the issue be closed?
Updated by jwille (Jens Wille) about 11 years ago
besides regexps being frozen, there might still be a use case for regexp literals that would only be allocated once:
def r1; /ab/; end; r1.object_id #=> 70043421664620
def r2; /ab/; end; r2.object_id #=> 70043421398060
def r3; /ab/f; end; r3.object_id #=> 70043421033140
def r4; /ab/f; end; r4.object_id #=> 70043421033140
Updated by sawa (Tsuyoshi Sawada) about 11 years ago
jwille, I agree with the use case, but it would be difficult to tell which regexes are intended to be the same, so I would not request that feature.
Probably, it makes sense to have all static regexes frozen, and have the f
flag freeze dynamic regexes as well. I can't think of a use case for a regex that is immutable but not frozen. I am actually not clear about the difference.
Updated by jwille (Jens Wille) about 11 years ago
but it would be difficult to tell which regexes are intended to be the same
i'm not sure i understand. how is
def r1; /ab/f; end
def r2; /ab/f; end
different from
def s1; 'ab'f; end
def s2; 'ab'f; end
?
Updated by sawa (Tsuyoshi Sawada) about 11 years ago
jwille,
My understanding with the case of string in your example is that the two strings would count as different strings, but for respective method calls would not create new strings. It would mean one of the string can be "ab"
and the other a different string such as "cd"
.
If that is what you intended for your regex examples, then there is no difference.
Updated by ko1 (Koichi Sasada) about 11 years ago
- Category set to syntax
- Assignee set to matz (Yukihiro Matsumoto)
- Target version set to 2.6
I like to freeze normal regexp literal that Eregon said.
2.2 matter?
Anyone set instance variable for each regexp? :)
Updated by Eregon (Benoit Daloze) about 11 years ago
ko1 (Koichi Sasada) wrote:
2.2 matter?
2.1 would make sense to me, so it goes along with other frozen literals.
Anyone set instance variable for each regexp? :)
I highly doubt it.
Updated by ko1 (Koichi Sasada) over 9 years ago
- Status changed from Feedback to Assigned
There are two options:
- Freeze only literal regexps
- Freeze all of regexps
I like (2) because I have no idea to change regexp objects.
History of "Frozen":
- Ruby 2.0
- Integer (Bignum, Fixnum) https://bugs.ruby-lang.org/issues/3222
- Float https://bugs.ruby-lang.org/issues/6936
- Ruby 2.1
- Ruby 2.2
- nil/true/false https://bugs.ruby-lang.org/issues/8923
Updated by ko1 (Koichi Sasada) over 9 years ago
A patch is small.
Index: re.c
===================================================================
--- re.c (revision 51650)
+++ re.c (working copy)
@@ -2548,6 +2548,8 @@
if (!re->ptr) return -1;
RB_OBJ_WRITE(obj, &re->src, rb_fstring(rb_enc_str_new(s, len, enc)));
RB_GC_GUARD(unescaped);
+
+ rb_obj_freeze(obj);
return 0;
}
But I got many failures on rubyspec.
https://gist.github.com/ko1/da52575de115c928ce4a
Updated by ko1 (Koichi Sasada) about 7 years ago
Now we can't run make test-all
because test/lib/test/unit.rb
defines singleton method for a regexp object.
def non_options(files, options)
filter = options[:filter]
if filter
pos_pat = /\A\/(.*)\/\z/
neg_pat = /\A!\/(.*)\/\z/
negative, positive = filter.partition {|s| neg_pat =~ s}
if positive.empty?
filter = nil
elsif negative.empty? and positive.size == 1 and pos_pat !~ positive[0]
filter = positive[0]
else
filter = Regexp.union(*positive.map! {|s| Regexp.new(s[pos_pat, 1] || "\\A#{Regexp.quote(s)}\\z")})
end
unless negative.empty?
negative = Regexp.union(*negative.map! {|s| Regexp.new(s[neg_pat, 1])})
filter = /\A(?=.*#{filter})(?!.*#{negative})/
end
if Regexp === filter
# bypass conversion in minitest
def filter.=~(other) # :nodoc: <-- singleton method!!
super unless Regexp === other
end
end
options[:filter] = filter
end
true
end
Is it acceptable (should we allow such code)?
Updated by matz (Yukihiro Matsumoto) about 7 years ago
I agree with frozen regexp literals. If you want to freeze all regexp objects, you need more info to persuade me.
Matz.
Updated by naruse (Yui NARUSE) about 7 years ago
This change will break compatibility.
If the change has clear benefit, such incompatibility is acceptable.
But this benefit seems not so much.
If so this should be postponed on introducing parallelization.
Updated by Eregon (Benoit Daloze) about 7 years ago
matz (Yukihiro Matsumoto) wrote:
I agree with frozen regexp literals. If you want to freeze all regexp objects, you need more info to persuade me.
Matz.
I think the inconsistency between Regexp being freezed by default or not (based on being literals) would be confusing.
I believe it would be better all Regexp frozen or none frozen, but not something in between.
Updated by shyouhei (Shyouhei Urabe) about 7 years ago
We had a discussion around this issue at yesterday's developer meeting.
Seems everyone there agreed that in long term future, regex shall be frozen. However the status quo is that there are wild applications that do override regex methods, particularly for testing purposes. So there needs to be some transition paths.
Freezing regex literals sounded a good starting point. We already optimize the return value of them:
3.times {
p(/foo/.object_id)
}
The merit of freezing regex literals are:
- Seems very safe to do. If it hurts someone, that means that person has already been bothered by the literal being shared among evaluations.
The disadvantage of freezing regex literals are:
- It benefits no one while it does confuse people. Regexp class has no built-in destructive methods so freezing them does almost nothing; just prohibits method redefinitions. Is it worth doing by deafult?
Updated by Eregon (Benoit Daloze) about 7 years ago
Regexp class has no built-in destructive methods so freezing them does almost nothing; just prohibits method redefinitions.
I think it is useful in communicating this is a immutable object.
So singleton methods definitions are prevented (it has to behave like every other Regexp), but methods can be added to the Regexp class.
Adding instances variables to the Regexp are also prevented, which seems a good idea as attaching state to immutable object makes no sense.
I agree with the rationale that it seems safer to first only freeze Regexp literals.
Does /#{23}/ counts as literal though? That expression produces different instances, unlike the /#{23}/o variant.
I think it should be considered as literal and be frozen.
Then the distinction is between literals and Regexp.new, which is much clearer than "literals except those with interpolation but without /o".
Updated by shyouhei (Shyouhei Urabe) almost 5 years ago
- Has duplicate Feature #16377: Regexp literals should be frozen added
Updated by byroot (Jean Boussier) almost 5 years ago
- Status changed from Assigned to Closed
Applied in changeset git|98ef38ada43338c073f50a0093196f0356284625.
Freeze Regexp literals
[Feature #8948] [Feature #16377]
Since Regexp literals always reference the same instance,
allowing to mutate them can lead to state leak.
Updated by ko1 (Koichi Sasada) about 4 years ago
- Status changed from Closed to Assigned
If so this should be postponed on introducing parallelization.
Reconsider about it? (freeze all regexp)
Updated by ko1 (Koichi Sasada) about 4 years ago
I think at least interpolated regexp (/a#{i}b/
) should be frozen though, like Range objects.
Updated by Eregon (Benoit Daloze) almost 4 years ago
- Related to Misc #17412: Regexp vs Range: freezing differences added
Updated by Eregon (Benoit Daloze) almost 4 years ago
@matz (Yukihiro Matsumoto) Is it OK to experiment freezing all Regexp objects for Ruby 3.1?
There is some discussion in #17412 too.
Updated by Eregon (Benoit Daloze) over 1 year ago
- Related to Feature #17256: Freeze all Regexp objects added
Updated by Eregon (Benoit Daloze) over 1 year ago
TruffleRuby has all Regexp objects frozen since a while, it would be great to do the same in CRuby for consistency.
Updated by Eregon (Benoit Daloze) over 1 year ago
- Assignee changed from matz (Yukihiro Matsumoto) to Eregon (Benoit Daloze)
I'll try it when I get some time.