Project

General

Profile

Actions

Bug #19055

closed

Regexp.new(regexp, timeout: nil) is intrupted by Regexp.timeout

Added by jaruga (Jun Aruga) over 1 year ago. Updated over 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [x86_64-linux]
[ruby-core:110287]

Description

It seems that that a code for Regexp timeout written in the release page[1] is actually interrupted by the Regexp.timeout (global configuration) in Ruby 3.2.0 preview2 built from the source. Do you know what's wrong?

$ cat reg_timeout2.rb 
Regexp.timeout = 1.0
# This regexp has no timeout
long_time_re = Regexp.new("^a*b?a*$", timeout: nil)
long_time_re =~ "a" * 50000 + "x" # never interrupted

The error Regexp::TimeoutError appeared soon after around 1 second after running the script reg_timeout2.rb.

$ which ruby
/usr/local/ruby-3.2.0-preview2/bin/ruby

$ ruby -v
ruby 3.2.0preview2 (2022-09-09 master 35cfc9a3bb) [x86_64-linux]

$ ruby reg_timeout2.rb 
reg_timeout2.rb:4:in `<main>': regexp match timeout (Regexp::TimeoutError)

$ time ruby reg_timeout2.rb 
reg_timeout2.rb:4:in `<main>': regexp match timeout (Regexp::TimeoutError)

real	0m1.072s
user	0m1.053s
sys	0m0.014s

$ echo $?
1
$ which irb
/usr/local/ruby-3.2.0-preview2/bin/irb

$ irb
irb(main):001:0> Regexp.timeout = 1.0
=> 1.0
irb(main):002:0> long_time_re = Regexp.new("^a*b?a*$", timeout: nil)
=> /^a*b?a*$/
irb(main):003:0> long_time_re =~ "a" * 50000 + "x"
(irb):3:in `<main>': regexp match timeout (Regexp::TimeoutError)
	from /usr/local/ruby-3.2.0-preview2/lib/ruby/gems/3.2.0+2/gems/irb-1.4.1/exe/irb:11:in `<top (required)>'
	from /usr/local/ruby-3.2.0-preview2/bin/irb:25:in `load'
	from /usr/local/ruby-3.2.0-preview2/bin/irb:25:in `<main>'
irb(main):004:0> 

[1] https://www.ruby-lang.org/en/news/2022/09/09/ruby-3-2-0-preview2-released/

Updated by mame (Yusuke Endoh) over 1 year ago

Good catch. Currently, there is no way to specify "no timeout" by timeout keyword.

I wonder what API is preferable here. I expect Regexp.new(str, timeout: nil) to be the same as Regexp.new(str). But it would be surprising if Regexp.new(str) ignores the global timeout configuration.

I came up with:

  • Regexp.new(str, timeout: 0) (a bit cryptic?)
  • Regexp.new(str, timeout: Float::INFINITY)
  • Regexp.new(str, timeout: :no_timeout)

I found another related problem. Currently, if a very long timeout is specified, it is just ignored.

Regexp.timeout = 1.0
long_time_re = Regexp.new("^a*b?a*$", timeout: 1e300)
long_time_re =~ "a" * 50000 + "x" # expect: never interrupt, actual: Regexp::TimeoutError in one second

I would like to make an exception for numbers that are too large to be expressed in time_t.

Updated by ioquatix (Samuel Williams) over 1 year ago

The model I've been following is this:

nil    # default timeout
0      # would timeout immediately
t > 0  # timeout after t seconds

The default timeout should have special meaning.

By default, timeout: nil should mean no timeout, but if there is global timeout, e.g. Regexp.timeout = t (if such a thing exists) and t itself is non-nil, then Regexp.new(..., timeout: nil) should use that default, i.e. with a default timeout set, it's impossible to bypass the timeout.

I wonder what API is preferable here. I expect Regexp.new(str, timeout: nil) to be the same as Regexp.new(str). But it would be surprising if Regexp.new(str) ignores the global timeout configuration.

I think I'm in agreement with this.

I think it would be a good idea to document the standard model for how timeouts are handled and used, and the way a instance specific timeout is handled in relation to a global timeout.

There are three layers, I'll use Regexp but it equally applies to IO and other "temporally relevant objects":

Regexp.timeout = nil | timeout # Global timeout, the default.
regexp = Regexp.new(..., timeout: nil | timeout) # Per instance timeout, if nil, uses the global default.
regexp.match(..., timeout: nil | timeout) # Per operation timeout, if nil, uses the instance timeout.

This is the most consistent and easy to understand model.

What it does mean though, is that if a per instance or global timeout is set, you cannot bypass it. I actually think that's a good thing. If you set a global timeout, you don't want some poorly structured code to bypass it.

That being said, if you do feel strongly about having a bypass feature, I propose using false.

regexp = Regexp.new(..., timeout: nil | false|  timeout) # Per instance timeout, if nil, uses the global default.
regexp.match(..., timeout: nil | false | timeout) # Per operation timeout, if nil, uses the instance timeout.

The timeout handling code should assume RTEST(regexp->timeout) implies there is no timeout (so false is handled correctly). So the default model becomes:

nil     # default timeout
0       # would timeout immediately
t > 0   # timeout after t seconds
false   # no timeout and don't use any per-instance/global default.

If we agree on this, I suggest we document it clearly for all relevant systems, as I'll do my best to follow the same model for IO timeouts so that we are consistent. The counter point is, if you choose something inconsistent for Regexp, I might not be able to adopt it for IO if it's really different. So please let's try to make it consistent :) thanks.

Updated by mame (Yusuke Endoh) over 1 year ago

That being said, if you do feel strongly about having a bypass feature, I propose using false.

I think "no timeout per instance" should be provided. It is reasonable to set the global timeout and unset timeout from only limited Regexp instances.

I think false would be a bit cryptic, but also nice. I have no preference between false, Float::INFINITY, and explicit symbol like :not_timeout (or something else). Let @matz (Yukihiro Matsumoto) decide at the next dev meeting.

Updated by nobu (Nobuyoshi Nakada) over 1 year ago

ioquatix (Samuel Williams) wrote in #note-2:

The model I've been following is this:

0      # would timeout immediately

Will it raise an exception before any matching?

Updated by Eregon (Benoit Daloze) over 1 year ago

I agree timeout=0 should mean immediately.
It's already been proposed for Queue#pop and SizedQueue#push: https://bugs.ruby-lang.org/issues/18982#note-8 / https://github.com/ruby/ruby/pull/6500.

I think timeout: false for no timeout is the best/clearest.
(also Float::INFINITY is quite messy to check for and deal with)

Will it raise an exception before any matching?

I think that's implementation details.
Because a timeout of 0 has no utility for Regexp I think it doesn't matter much which way it's handled.

@mame (Yusuke Endoh)

It is reasonable to set the global timeout and unset timeout from only limited Regexp instances.

Is it though? It's basically breaking the intent of whoever set the global Regexp.timeout.
If a Regexp needs a longer time maybe it should just set a higher but not infinite timeout? Although if that's set to a very big number it doesn't help anything in practice.
I suppose there might be some legitimate cases though.
For example if a gem uses that it could be pretty bad.

Updated by ioquatix (Samuel Williams) over 1 year ago

Will it raise an exception before any matching?

In code where I've implemented similar checks, I've done something like this:

while (doing work)... {
  do work
  check timeout
}

It means timeout=0 can do one iteration and then if it's not completed, it raises a timeout error. I think this is good because it implies that if you can do what you need in a single "constant time operation" it's okay. I don't know about regexp implementation, but I assume some very small regexp with a timeout of 0 could still work correctly.

Updated by mame (Yusuke Endoh) over 1 year ago

Discussed at the dev meeting.

Is it though? It's basically breaking the intent of whoever set the global Regexp.timeout.

@matz (Yukihiro Matsumoto) agreed with this. He decided not to allow users to specify "force no timeout per instance, with ignoring the global timeout setting". So, we do not introduce a new API for that, like Regexp.new(str, timeout: false) or something.

Also, the following is decided for other detailed behaviors:

# The following Regexps wait for 2^64 nanoseconds (= 500 years, essentially infinite)
Regexp.new(str, timeout: 1e300)
Regexp.new(str, timeout: Float::INFINITY)

# The following code should not create a Regexp instance but raise an exception immediately
Regexp.new(str, timeout: 0)      # Raise an exception
Regexp.new(str, timeout: 1e-300) # Same as 0 because it is truncated to the nanosecond
Regexp.new(str, timeout: -1)     # Same as 0

Another possible meaning of Regexp.new(str, timeout: 0) is to raise an exception immediately at the match. However, we do not see any use case for the behavior. Also, it would be less useful in terms of early failure. We may allow it in future if there is a reasonable use case.

Updated by jaruga (Jun Aruga) over 1 year ago

Thanks for the discussion, guys.

So, I think that we need to fix the Regexp timeout's code examples at least in the next Ruby (Ruby 3.2.0 preview3 or official 3.2.0) release note. The comment "# never interrupted" below in the example is not correct.

long_time_re =~ "a" * 50000 + "x" # never interrupted
Actions #9

Updated by mame (Yusuke Endoh) over 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|b51b22513f6a2a514a9e614e7a9f6e0df6c0c985.


Fix per-instance Regexp timeout (#6621)

Fix per-instance Regexp timeout

This makes it follow what was decided in [Bug #19055]:

  • Regexp.new(str, timeout: nil) should respect the global timeout
  • Regexp.new(str, timeout: huge_val) should use the maximum value that
    can be represented in the internal representation
  • Regexp.new(str, timeout: 0 or negative value) should raise an error
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0