Project

General

Profile

Backport #8210

Multibyte character interfering with end-line character within a regex

Added by sawa (Tsuyoshi Sawada) over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
[ruby-core:53944]

Description

=begin
With this regex:

regex1 = /\z/

the following strings match as expected:

"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5

but with these regexes:

regex2 = /#$/?\z/
regex3 = /\n?\z/

they show difference:

"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). I expect them to behave the same, and believe this is a bug.
=end

fix-8210-1.diff (742 Bytes) fix-8210-1.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-2.diff (491 Bytes) fix-8210-2.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-1-update.diff (834 Bytes) fix-8210-1-update.diff k_takata (Ken Takata), 04/13/2013 07:31 PM

Associated revisions

Revision 536a3274
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@40276 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 40276
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

Revision 40276
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

Revision 40276
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

Revision 40276
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

Revision 40276
Added by naruse (Yui NARUSE) over 5 years ago

  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

Revision 8c43fc02
Added by nagachika (Tomoyuki Chikanaga) over 5 years ago

merge revision(s) 40276: [Backport #8210]

* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
  [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_0_0@40384 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision e8905e10
Added by usa (Usaku NAKAMURA) over 5 years ago

  • regexec.c (onig_search): fix problem with optimization of \z. [Backport #8210] patched by k_tanaka at .

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_9_3@40713 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 40713
Added by usa (Usaku NAKAMURA) over 5 years ago

  • regexec.c (onig_search): fix problem with optimization of \z. [Backport #8210] patched by k_tanaka at .

History

#1 [ruby-core:53945] Updated by sawa (Tsuyoshi Sawada) over 5 years ago

=begin
A different regex:

regex4 = /[[:space:]]?\z/

seems to work as expected:

"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5

=end

#2 [ruby-core:53946] Updated by sawa (Tsuyoshi Sawada) over 5 years ago

=begin
Still a different regex:

regex5 = /\n?$/

seems to work as expected:

"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5

=end

#3 [ruby-core:54046] Updated by sawa (Tsuyoshi Sawada) over 5 years ago

=begin
The problem seems to happen with combination of a certain token, ?, and \z.

"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5

=end

#4 [ruby-core:54058] Updated by sawa (Tsuyoshi Sawada) over 5 years ago

Is this bug report wrong? If so, please note so.

#5 [ruby-core:54060] Updated by naruse (Yui NARUSE) over 5 years ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)
  • Target version set to 2.1.0

sawa (Tsuyoshi Sawada) wrote:

Is this bug report wrong? If so, please note so.

This looks really bug of oniguruma/onigmo.

#6 [ruby-core:54118] Updated by acheong87 (Andrew Cheong) over 5 years ago

Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.

#8 [ruby-core:54145] Updated by k_takata (Ken Takata) over 5 years ago

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

#9 [ruby-core:54166] Updated by sawa (Tsuyoshi Sawada) over 5 years ago

Is either of k_takata's bug fix going to be incorporated?

#10 [ruby-core:54179] Updated by naruse (Yui NARUSE) over 5 years ago

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

I think -1 is suitable because it looks to keep original intention more than -2.

#11 [ruby-core:54251] Updated by k_takata (Ken Takata) over 5 years ago

I think -1 is suitable because it looks to keep original intention more than -2.

Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee

I also attach an updated patch so that can be applied to Ruby 1.9.3.

#12 Updated by naruse (Yui NARUSE) over 5 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r40276.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee. [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

#13 Updated by naruse (Yui NARUSE) over 5 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby trunk to Backport200
  • Category deleted (M17N)
  • Status changed from Closed to Assigned
  • Assignee changed from naruse (Yui NARUSE) to nagachika (Tomoyuki Chikanaga)
  • Target version deleted (2.1.0)

#14 [ruby-core:54252] Updated by k_takata (Ken Takata) over 5 years ago

I think it's better to backport this patch to Ruby 1.9.3 too.

#15 Updated by nagachika (Tomoyuki Chikanaga) over 5 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40384.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 40276: [Backport #8210]

* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
  [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]

#16 Updated by nagachika (Tomoyuki Chikanaga) over 5 years ago

  • Project changed from Backport200 to Backport193
  • Status changed from Closed to Assigned
  • Assignee changed from nagachika (Tomoyuki Chikanaga) to usa (Usaku NAKAMURA)

Move to Backport93.
But Onigmo is merged after 2.0. I didn't confirm this patch can merge to ruby_1_9_3...

#17 Updated by usa (Usaku NAKAMURA) over 5 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40713.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • regexec.c (onig_search): fix problem with optimization of \z. [Backport #8210] patched by k_tanaka at .

#18 [ruby-core:54979] Updated by k_takata (Ken Takata) over 5 years ago

Hi usa,

  • regexec.c (onig_search): fix problem with optimization of \z. [Backport #8210] patched by k_tanaka at .

Thank you for merging my patch.
BTW, my name is not tanaka...

Also available in: Atom PDF