Backport #8210

Multibyte character interfering with end-line character within a regex

Added by Tsuyoshi Sawada about 1 year ago. Updated 12 months ago.

[ruby-core:53944]
Status:Closed
Priority:Normal
Assignee:Usaku NAKAMURA

Description

=begin
With this regex:

regex1 = /\z/

the following strings match as expected:

"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5

but with these regexes:

regex2 = /#$/?\z/
regex3 = /\n?\z/

they show difference:

"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). I expect them to behave the same, and believe this is a bug.
=end

fix-8210-1.diff Magnifier (742 Bytes) Ken Takata, 04/10/2013 12:41 AM

fix-8210-2.diff Magnifier (491 Bytes) Ken Takata, 04/10/2013 12:41 AM

fix-8210-1-update.diff Magnifier (834 Bytes) Ken Takata, 04/13/2013 07:31 PM

Associated revisions

Revision 40713
Added by Usaku NAKAMURA 12 months ago

  • regexec.c (onigsearch): fix problem with optimization of \z. [Backport #8210] patched by ktanaka at .

History

#1 Updated by Tsuyoshi Sawada about 1 year ago

=begin
A different regex:

regex4 = /[[:space:]]?\z/

seems to work as expected:

"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5

=end

#2 Updated by Tsuyoshi Sawada about 1 year ago

=begin
Still a different regex:

regex5 = /\n?$/

seems to work as expected:

"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5

=end

#3 Updated by Tsuyoshi Sawada about 1 year ago

=begin
The problem seems to happen with combination of a certain token, ?, and \z.

"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5

=end

#4 Updated by Tsuyoshi Sawada about 1 year ago

Is this bug report wrong? If so, please note so.

#5 Updated by Yui NARUSE about 1 year ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to Yui NARUSE
  • Target version set to 2.1.0

sawa (Tsuyoshi Sawada) wrote:

Is this bug report wrong? If so, please note so.

This looks really bug of oniguruma/onigmo.

#6 Updated by Andrew Cheong about 1 year ago

Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.

#7 Updated by Franco Rondini about 1 year ago

Just edited the answer and test code available

#8 Updated by Ken Takata about 1 year ago

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

#9 Updated by Tsuyoshi Sawada about 1 year ago

Is either of k_takata's bug fix going to be incorporated?

#10 Updated by Yui NARUSE about 1 year ago

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

I think -1 is suitable because it looks to keep original intention more than -2.

#11 Updated by Ken Takata about 1 year ago

I think -1 is suitable because it looks to keep original intention more than -2.

Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee

I also attach an updated patch so that can be applied to Ruby 1.9.3.

#12 Updated by Yui NARUSE about 1 year ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r40276.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

#13 Updated by Yui NARUSE about 1 year ago

  • Tracker changed from Bug to Backport
  • Project changed from ruby-trunk to Backport200
  • Category deleted (M17N)
  • Status changed from Closed to Assigned
  • Assignee changed from Yui NARUSE to Tomoyuki Chikanaga
  • Target version deleted (2.1.0)

#14 Updated by Ken Takata about 1 year ago

I think it's better to backport this patch to Ruby 1.9.3 too.

#15 Updated by Tomoyuki Chikanaga about 1 year ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40384.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 40276: [Backport #8210]

* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
  [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

#16 Updated by Tomoyuki Chikanaga about 1 year ago

  • Project changed from Backport200 to Backport93
  • Status changed from Closed to Assigned
  • Assignee changed from Tomoyuki Chikanaga to Usaku NAKAMURA

Move to Backport93.
But Onigmo is merged after 2.0. I didn't confirm this patch can merge to ruby19_3...

#17 Updated by Usaku NAKAMURA 12 months ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40713.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • regexec.c (onigsearch): fix problem with optimization of \z. [Backport #8210] patched by ktanaka at .

#18 Updated by Ken Takata 12 months ago

Hi usa,

  • regexec.c (onigsearch): fix problem with optimization of \z. [Backport #8210] patched by ktanaka at .

Thank you for merging my patch.
BTW, my name is not tanaka...

Also available in: Atom PDF