Project

General

Profile

Actions

Bug #20504

closed

Interpolated string literal in regexp encoding handling

Added by kddnewton (Kevin Newton) 10 months ago. Updated 2 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:117990]

Description

There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here:

# encoding: us-ascii

interp = "\x80"
regexp = /#{interp}/

the regexp variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation:

# encoding: us-ascii

regexp = /#{"\x80"}/

you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior?

I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent.

Updated by Eregon (Benoit Daloze) 10 months ago

Agreed, the current behavior breaks referential transparency and unexpectedly analyzes string literals inside interpolated parts.
This leads to extra confusion and I would think has no value in real-world usages of interpolated regexps (because it causes an error instead of none).

So I think this is a bug and the implementation should not analyze those parts and consequently the behavior should be the same as with the extra local variable.

Actions #2

Updated by Eregon (Benoit Daloze) 10 months ago

  • Tracker changed from Misc to Bug
  • Backport set to 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN

Updated by kddnewton (Kevin Newton) 10 months ago

I'm fine with it analyzing the string literals, I would just prefer it take the same codepath as the interpolated variable case, in which it would produce an ascii-8bit regular expression as opposed to raising an error.

Updated by mame (Yusuke Endoh) 9 months ago

Discussed at the dev meeting, and @matz (Yukihiro Matsumoto) said /#{"\x80"}/ should not raise a SyntaxError but return a binary encoded regexp object.

Actions #5

Updated by nobu (Nobuyoshi Nakada) 2 months ago

  • Status changed from Open to Closed

Applied in changeset git|6bbb470dc77a671c67411a5d3a2564bd0a665a9c.


[Bug #20504] Move dynamic regexp concatenation to iseq compiler

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0