Project

General

Profile

Actions

Bug #18797

closed

Third argument to Regexp.new is a bit broken

Added by janosch-x (Janosch Mรผller) almost 2 years ago. Updated about 1 year ago.


Description

Situation

'n' or 'N' can be passed as a third argument to Regexp.new. However, the behavior is not the same as the literal n-flag or the Regexp::NOENCODING option, and it makes the #encoding of Regexp and Regexp#source diverge:

/๐Ÿ˜…/n # => SyntaxError
Regexp.new('๐Ÿ˜…', Regexp::NOENCODING) # => RegexpError
re = Regexp.new('๐Ÿ˜…', nil, 'n') # => /๐Ÿ˜…/
re.options == Regexp::NOENCODING # => true
re.encoding # => ASCII-8BIT
re.source.encoding # => UTF-8
re =~ '๐Ÿ˜…' # => Encoding::CompatibilityError

Code

Here. There is also a test for the resulting encoding here, but it is a no-op because the whole file is set to that encoding via magic comment anyway.

The third argument was added when ASCII was still the default Ruby encoding, so I guess Regexp and source encoding still matched at that point.

Solution

It could be fixed, but my impression is that it is not useful anymore.

It was probably only added because Regexp::NOENCODING wasn't available at the time, so I think it could be deprecated like so:

Passing a third argument to Regexp.new is deprecated. Use Regexp::NOENCODING as second argument instead.


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #20084: Breaking change with Regexp.new on 3.3.0ClosedActions

Updated by matz (Yukihiro Matsumoto) over 1 year ago

This is indeed an obsolete feature for long time. And the third argument is ignored (IIRC).
Some (old) gems still use it because they are not updated for long time. Warnings should not improve the situation (since most of them are unmaintained).
So we should update the document and leave the code as it is.

Matz.

Updated by jeremyevans0 (Jeremy Evans) over 1 year ago

matz (Yukihiro Matsumoto) wrote in #note-1:

This is indeed an obsolete feature for long time. And the third argument is ignored (IIRC).

Unfortunately, the third argument is not ignored:

p Regexp.new("\u1234", nil, "n").encoding
#<Encoding:ASCII-8BIT>

p Regexp.new("\u1234", nil).encoding
#<Encoding:UTF-8>

Would you like to change the behavior to ignore the third argument without deprecating it? Or would you prefer to keep the current behavior and document it?

Updated by matz (Yukihiro Matsumoto) over 1 year ago

I thought it was marked as obsolete long time ago. I think it's time to make it obsolete, probably via the usual obsoleting process (i.e. adding warning first).

Matz

Actions #4

Updated by jeremyevans (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|7e8fa06022a9e412e3f8e6c8b6f0ba1909f648d5.


Always issue deprecation warning when calling Regexp.new with 3rd positional argument

Previously, only certain values of the 3rd argument triggered a
deprecation warning.

First step for fix for bug #18797. Support for the 3rd argument
will be removed after the release of Ruby 3.2.

Fix minor fallout discovered by the tests.

Co-authored-by: Nobuyoshi Nakada

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Status changed from Closed to Open

Pull request submitted to remove support for the 3rd argument: https://github.com/ruby/ruby/pull/7039

Updated by Eregon (Benoit Daloze) about 1 year ago

  • Assignee set to jeremyevans0 (Jeremy Evans)
  • Target version set to 3.3
Actions #7

Updated by jeremyevans (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed

Applied in changeset git|04cfb26bd394b8e92f24f18799f5e9fc96b2ea69.


Remove support for the Regexp.new 3rd argument

This was deprecated in Ruby 3.2.

Fixes [Bug #18797]

Actions #8

Updated by hsbt (Hiroshi SHIBATA) 3 months ago

  • Related to Bug #20084: Breaking change with Regexp.new on 3.3.0 added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0