Project

General

Profile

Feature #12700

regexg heredoc support

Added by gam3 (Allen Morris) almost 3 years ago. Updated over 1 year ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:77035]

Description

There is support for ', ", and ` heredocs, but there is no support for /.

Example code with new feature:

first = 'first'

orig = /#{first}
match\s # match
this    # match this
/x

right = <</REGEXP/x
#{first}
match\s # match
this    # match this
REGEXP

raise unless orig == right

There is no straight forward way to replace a regexp heredoc as a double quote heredoc requires that '\s' be escaped.
As shown in the code below you can't use the string heredoc to directly replace a regexp heredoc because of this need for extra escaping.

first = 'first'

orig = /#{first}
match\s # match
this    # match this
/x

wrong = Regexp.new(<<REGEXP, Regexp::EXTENDED)
#{first}
match\s # match
this    # match this
REGEXP

right = Regexp.new(<<REGEXP, Regexp::EXTENDED)
#{first}
match\\s # match
this    # match this
REGEXP

raise unless orig != wrong
raise unless orig == right

Files

regex_heredoc_patch (2.99 KB) regex_heredoc_patch patch to enable regex heredoc gam3 (Allen Morris), 08/24/2016 08:39 AM

History

Updated by gam3 (Allen Morris) almost 3 years ago

Updated pull request.

Updated by matz (Yukihiro Matsumoto) almost 3 years ago

  • Status changed from Open to Rejected

Use %r.

Matz.

Updated by gam3 (Allen Morris) almost 3 years ago

I don't see how %r helps.

Here is an (rather forced) example of the advantage of a /HEREDOC/

a = "one"
b = "two"

raise "error" unless a.match(<</REG/x)[1] == b.match(<</REG/x)[1]
(.) # what we want to match
n   # what we want to skip
e   # more to skip
REG
t   # what we want to skip
w   # more to skip
(.) # what we want to match
REG

raise "error" unless a.match(%r|
(.) # what we want to match
n   # what we want to skip
e   # more to skip
|x)[1] == b.match(%r|
t   # what we want to skip
w   # more to skip
(.) # what we want to match
|x)[1]

Updated by shyouhei (Shyouhei Urabe) almost 3 years ago

  • Status changed from Rejected to Open

Reopened.

Though I have never needed to write such long regexp literals inline, privately.
Whenever I wanted multine superb regexps they are named, most likely become constants.
For variable/constant assignments, %r perfectly works. I doubt the actual needs of this syntax.

Updated by duerst (Martin Dürst) almost 3 years ago

Shyouhei Urabe wrote:

Reopened.

Though I have never needed to write such long regexp literals inline, privately.
Whenever I wanted multine superb regexps they are named, most likely become constants.
For variable/constant assignments, %r perfectly works. I doubt the actual needs of this syntax.

I'm confused. You only give arguments for rejection, but then reopen the issue.

Updated by shyouhei (Shyouhei Urabe) almost 3 years ago

Martin Dürst wrote:

I'm confused. You only give arguments for rejection, but then reopen the issue.

My private opinion is I don't need this.
But I don't want to rule out my being wrong-headed.
The OP might have other use-case where this is useful.
So the reopen.
Everyone who need this feature are encouraged to involve this thread.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

I had a chance to write a regexp constant consists of 300+ lines.
I have to admit that I did wish I could write that using a heredoc.

So I changed my mind. Let me +1.

Updated by duerst (Martin Dürst) over 1 year ago

shyouhei (Shyouhei Urabe) wrote:

I had a chance to write a regexp constant consists of 300+ lines.
I have to admit that I did wish I could write that using a heredoc.

So I changed my mind. Let me +1.

If that 300+ lines regexp is public (or can be made public), I'd like to see a pointer.

There may be exceptions, but I don't think it's a good idea to write a regexp constant with 300+ lines by hand.
(The regular expression pieces in https://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/lib/unicode_normalize/tables.rb?view=markup are way shorter than 300 lines, but I wouldn't have wanted to write them by hand anyway.)

Looking at the examples above, the advantages for the regexp heredoc over %r seem to be the fact that two or more of them can be started in the same line (including the options). The advantage over indirect construction via string heredoc seems to be that no double escape is necessary. None of these advantages seems directly related to the length of the regexp.

Just some points; I'm not too strongly against introducing this.

Updated by shyouhei (Shyouhei Urabe) over 1 year ago

duerst (Martin Dürst) wrote:

So I changed my mind. Let me +1.

If that 300+ lines regexp is public (or can be made public), I'd like to see a pointer.

Here you are: https://github.com/shyouhei/optdown/blob/master/lib/optdown/expr.rb

There may be exceptions, but I don't think it's a good idea to write a regexp constant with 300+ lines by hand.
(The regular expression pieces in https://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/lib/unicode_normalize/tables.rb?view=markup are way shorter than 300 lines, but I wouldn't have wanted to write them by hand anyway.)

I have to agree with this part from the bottom of my heart. It was a wrong decision for me to write the expression above. I should have used something different.

But I did it anyway because it seemed possible. And it was. The experience was terrible as expected. A regexp heredoc should absorb some part of the pain I believe.

Looking at the examples above, the advantages for the regexp heredoc over %r seem to be the fact that two or more of them can be started in the same line (including the options). The advantage over indirect construction via string heredoc seems to be that no double escape is necessary. None of these advantages seems directly related to the length of the regexp.

True.

What I found during writing a long regular expression is that the expression seemed to contain all the possible punctuation character to be used in %r. Two or more characters to terminate the expression seemed the right solution. At the same time I needed to interpolate variables into the expression I could not use Regexp.new(<<END) -- that way I had to double all the backslashes. These two are the main reasons I changed my mind to +1 this request.

Just some points; I'm not too strongly against introducing this.

We can live without heredoc regexps. In fact I do in the example above. However, that is also true for all other sort of heredocs; we don't need them in theory if we could properly escape everything. In practice that is too annoying. The same goes with regexp literals I believe.

Also available in: Atom PDF