Project

General

Profile

Bug #11014

String#partition doesn't return correct result on zero-width match

Added by janko (Janko Marohnić) almost 5 years ago. Updated 5 days ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.2.1p85 (2015-02-26 revision 49769) [x86_64-darwin14]
[ruby-core:<unknown>]

Description

First, to see how String#match works on my example:

match = "foo".match(/^=*/)
match.pre_match  #=> ""
match[0]         #=> ""
match.post_match #=> "foo"

Now, if I used String#partition instead of match, I'd expect to get ["", "", "foo"] (pre_match, match, post_match). However

"foo".partition(/^=*/) #=> ["foo", "", ""]

String#rpartition returns the correct result (with the same regex).

Associated revisions

Revision fce54a54
Added by nobu (Nobuyoshi Nakada) 5 days ago

Fix String#partition

Split with the matched part when the separator matches the empty
part at the beginning. [Bug #11014]

History

#1

Updated by nobu (Nobuyoshi Nakada) almost 5 years ago

  • Description updated (diff)
  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)

These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.

Updated by sawa (Tsuyoshi Sawada) 14 days ago

The problem is not just for partition, but also involves split and scan.

I think your regex /^=*/ is unnecessarily complex. Your point can be made by /\A/, which is simpler.

I tried with four regex patterns /\A/, /\A.*/, /\z/, /.*\z/, and compared methods split, partition, scan. The result of the first example in each group below matches the second and the third, and the fourth one matches the middle element. So far, so good.

"foo".match(/\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["foo", "", ""]
"foo".split(/(\z)/, -1) # => ["foo", "", ""]
"foo".partition(/\z/) # => ["foo", "", ""]
"foo".scan(/\z/) # => [""]
"foo".match(/\A.*/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(\A.*)/, -1) # => ["", "foo", ""]
"foo".partition(/\A.*/) # => ["", "foo", ""]
"foo".scan(/\A.*/) # => ["foo"]

In the following, we see inconsistency:

"foo".match(/\A/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "", "foo"]
"foo".split(/(\A)/, -1) # => ["foo"]
"foo".partition(/\A/) # => ["foo", "", ""]
"foo".scan(/\A/) # => [""]
"foo".match(/.*\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(.*\z)/, -1) # => ["", "foo", ""]
"foo".partition(/.*\z/) # => ["", "foo", ""]
"foo".scan(/.*\z/) # => ["foo", ""]

The problematic cases and their expected values (in terms of consistency) are:

"foo".split(/(\A)/, -1) # => ["foo"], expected [ "", "", "foo"]
"foo".partition(/\A/) # => ["foo", "", ""], expected ["", "", "foo"]
"foo".scan(/.*\z/) # => ["foo", ""], expected ["foo"]

The case described in the issue is the second case above.

Updated by Dan0042 (Daniel DeLorme) 14 days ago

IIRC this has to do with zero-length matches being ignored in certain conditions, in particular having to do with repeating/multiple matches.

if "foo".split(/\A/) was ["","foo"]
then "foo".split(//) would have to be ["","f","o","o"]
and "foo".split(/\G/) could result in infinite loop matching ["","","","","",..."foo"]

But I don't understand why partition doesn't behave like match.
Ah, probably because it behaves like split(rx,2)

Note that gsub has different behavior:
"foo".gsub(/\G/,'_') #=> "_f_o_o_"
"foo".gsub(//,'_') #=> "_f_o_o_"

explained better than I ever could:
https://www.regular-expressions.info/zerolength.html

Updated by mame (Yusuke Endoh) 8 days ago

We'd like to focus on String#partition in this ticket.

IMO, String#scan and #split are heavily used so they should not change just for consistency reason. Please create another ticket if you really need to discuss. And a patch suggestion is welcome.

Updated by akr (Akira Tanaka) 5 days ago

nobu (Nobuyoshi Nakada) wrote:

These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.

I couldn't confirm it.

% python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "abc".partition("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator
>>> 

The empty separator causes an error in Python.

Updated by akr (Akira Tanaka) 5 days ago

I feel the current behavior is just a bug and "abc".partition(//) should return ["", "", "abc"] instead ["abc", "", ""].

#7

Updated by nobu (Nobuyoshi Nakada) 5 days ago

  • Status changed from Assigned to Closed

Applied in changeset git|fce54a5404139a77bd0b7d6f82901083fcb16f1e.


Fix String#partition

Split with the matched part when the separator matches the empty
part at the beginning. [Bug #11014]

Also available in: Atom PDF