Project

General

Profile

Feature #16377

Regexp literals should be frozen

Added by byroot (Jean Boussier) 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Target version:
[ruby-core:95977]

Description

The following script:

def mutate
  re = /foo/
  state = re.instance_variable_get(:@state)
  re.instance_variable_set(:@state, state.to_i + 1)
  state
end

3.times do
  p mutate
end

Output this:

nil
1
2

IMHO, you shouldn't be able to mutate an unduplicated literal.

GitHub pull request: https://github.com/ruby/ruby/pull/2705


Related issues

Is duplicate of Ruby master - Feature #8948: Frozen regexClosedmatz (Yukihiro Matsumoto)Actions
#2

Updated by shyouhei (Shyouhei Urabe) 7 months ago

Updated by Dan0042 (Daniel DeLorme) 7 months ago

I really hope this does not go through.

Regexp literals have been "unduplicated" like this since at least 1.8 and we've never had problems. And now we should freeze them and introduce an incompatibility just for the sake of Communicating the Holy Gospel of Immutability? I don't find that a valid reason.

Additionaly, Regexp literals are not deduplicated in the same sense as frozen string literals; one /abc/ is independant from another /abc/ so we're not actually leaking "global" state. In the example above the state is local to the method but shared between invocations. Not sure how that should be called, but certainly not "global".

And what if the mutate behavior shown above is actually wanted? Sure it's a hack, but it's a bit like function-static variables in php.

Or, more realistically, what about something like this?

class Regexp
  def analyze
    @analyze ||= RegexpAnalyzer.analyze_performance_issues(self)
  end
end

Freezing objects closes off possibilities when it's done by default, and should only be done when absolutely necessary.

Updated by byroot (Jean Boussier) 7 months ago

just for the sake of Communicating the Holy Gospel of Immutability?

Please don't put ideological thoughts on me, that's not a good basis for debate.

Regexp literals are not deduplicated in the same sense as frozen string literals; one /abc/ is independant from another /abc/

And if they were frozen, they could be.

so we're not actually leaking "global" state

On a particular callsite it does.

Or, more realistically, what about something like this?

class Regexp
  @@analyses = {}.compare_by_identity

  def analyze
    @@analyses[self] ||= RegexpAnalyzer.analyze_performance_issues(self)
  end
end

Updated by Dan0042 (Daniel DeLorme) 7 months ago

Please don't put ideological thoughts on me, that's not a good basis for debate.

Oh no, I'm not. Sorry for the misunderstanding. I was mostly replying to comments in #8948, but I ended up posting here because this is the ticket that is linked in #16393 DevelopersMeeting20191220Japan.

Yes, of course when one way is blocked there's always an alternate way of doing things. But the point is that if someone is currently relying on Regexp literals not being frozen, this change will break their code. Given that the benefit is close to zero, I don't think it's responsible to force anyone to bear that cost. I'm very conservative about backward compatibility.

#6

Updated by mame (Yusuke Endoh) 6 months ago

  • Target version set to 2.8

Updated by mame (Yusuke Endoh) 6 months ago

  • Assignee set to mame (Yusuke Endoh)

At the previous dev meeting, matz said that let's give it a try :-)

For the record: Regexp.new should continue to return unfrozen Regexp instance.

I'll review the pull request.

Updated by ko1 (Koichi Sasada) 6 months ago

Could you add a NEWS entry?

#9

Updated by byroot (Jean Boussier) 6 months ago

  • Status changed from Open to Closed

Applied in changeset git|98ef38ada43338c073f50a0093196f0356284625.


Freeze Regexp literals

[Feature #8948] [Feature #16377]

Since Regexp literals always reference the same instance,
allowing to mutate them can lead to state leak.

Also available in: Atom PDF