Project

General

Profile

Actions

Bug #11014

closed

String#partition doesn't return correct result on zero-width match

Added by janko (Janko Marohnić) almost 10 years ago. Updated about 5 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.2.1p85 (2015-02-26 revision 49769) [x86_64-darwin14]
[ruby-core:<unknown>]

Description

First, to see how String#match works on my example:

match = "foo".match(/^=*/)
match.pre_match  #=> ""
match[0]         #=> ""
match.post_match #=> "foo"

Now, if I used String#partition instead of match, I'd expect to get ["", "", "foo"] (pre_match, match, post_match). However

"foo".partition(/^=*/) #=> ["foo", "", ""]

String#rpartition returns the correct result (with the same regex).

Actions #1

Updated by nobu (Nobuyoshi Nakada) almost 10 years ago

  • Description updated (diff)
  • Status changed from Open to Assigned
  • Assignee set to matz (Yukihiro Matsumoto)

These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.

Updated by sawa (Tsuyoshi Sawada) about 5 years ago

The problem is not just for partition, but also involves split and scan.

I think your regex /^=*/ is unnecessarily complex. Your point can be made by /\A/, which is simpler.

I tried with four regex patterns /\A/, /\A.*/, /\z/, /.*\z/, and compared methods split, partition, scan. The result of the first example in each group below matches the second and the third, and the fourth one matches the middle element. So far, so good.

"foo".match(/\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["foo", "", ""]
"foo".split(/(\z)/, -1) # => ["foo", "", ""]
"foo".partition(/\z/) # => ["foo", "", ""]
"foo".scan(/\z/) # => [""]
"foo".match(/\A.*/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(\A.*)/, -1) # => ["", "foo", ""]
"foo".partition(/\A.*/) # => ["", "foo", ""]
"foo".scan(/\A.*/) # => ["foo"]

In the following, we see inconsistency:

"foo".match(/\A/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "", "foo"]
"foo".split(/(\A)/, -1) # => ["foo"]
"foo".partition(/\A/) # => ["foo", "", ""]
"foo".scan(/\A/) # => [""]
"foo".match(/.*\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(.*\z)/, -1) # => ["", "foo", ""]
"foo".partition(/.*\z/) # => ["", "foo", ""]
"foo".scan(/.*\z/) # => ["foo", ""]

The problematic cases and their expected values (in terms of consistency) are:

"foo".split(/(\A)/, -1) # => ["foo"], expected [ "", "", "foo"]
"foo".partition(/\A/) # => ["foo", "", ""], expected ["", "", "foo"]
"foo".scan(/.*\z/) # => ["foo", ""], expected ["foo"]

The case described in the issue is the second case above.

Updated by Dan0042 (Daniel DeLorme) about 5 years ago

IIRC this has to do with zero-length matches being ignored in certain conditions, in particular having to do with repeating/multiple matches.

if "foo".split(/\A/) was ["","foo"]
then "foo".split(//) would have to be ["","f","o","o"]
and "foo".split(/\G/) could result in infinite loop matching ["","","","","",..."foo"]

But I don't understand why partition doesn't behave like match.
Ah, probably because it behaves like split(rx,2)

Note that gsub has different behavior:
"foo".gsub(/\G/,'_') #=> "_f_o_o_"
"foo".gsub(//,'_') #=> "_f_o_o_"

explained better than I ever could:
https://www.regular-expressions.info/zerolength.html

Updated by mame (Yusuke Endoh) about 5 years ago

We'd like to focus on String#partition in this ticket.

IMO, String#scan and #split are heavily used so they should not change just for consistency reason. Please create another ticket if you really need to discuss. And a patch suggestion is welcome.

Updated by akr (Akira Tanaka) about 5 years ago

nobu (Nobuyoshi Nakada) wrote:

These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.

I couldn't confirm it.

% python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "abc".partition("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: empty separator
>>> 

The empty separator causes an error in Python.

Updated by akr (Akira Tanaka) about 5 years ago

I feel the current behavior is just a bug and "abc".partition(//) should return ["", "", "abc"] instead ["abc", "", ""].

Actions #7

Updated by nobu (Nobuyoshi Nakada) about 5 years ago

  • Status changed from Assigned to Closed

Applied in changeset git|fce54a5404139a77bd0b7d6f82901083fcb16f1e.


Fix String#partition

Split with the matched part when the separator matches the empty
part at the beginning. [Bug #11014]

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0