Project

General

Profile

Actions

Backport #8210

closed

Multibyte character interfering with end-line character within a regex

Added by sawa (Tsuyoshi Sawada) almost 12 years ago. Updated over 11 years ago.

Status:
Closed
[ruby-core:53944]

Description

=begin
With this regex:

regex1 = /\z/

the following strings match as expected:

"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5

but with these regexes:

regex2 = /#$/?\z/
regex3 = /\n?\z/

they show difference:

"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). I expect them to behave the same, and believe this is a bug.
=end


Files

fix-8210-1.diff (742 Bytes) fix-8210-1.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-2.diff (491 Bytes) fix-8210-2.diff k_takata (Ken Takata), 04/10/2013 12:41 AM
fix-8210-1-update.diff (834 Bytes) fix-8210-1-update.diff k_takata (Ken Takata), 04/13/2013 07:31 PM

Updated by sawa (Tsuyoshi Sawada) almost 12 years ago

=begin
A different regex:

regex4 = /[[:space:]]?\z/

seems to work as expected:

"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5

=end

Updated by sawa (Tsuyoshi Sawada) almost 12 years ago

=begin
Still a different regex:

regex5 = /\n?$/

seems to work as expected:

"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5

=end

Updated by sawa (Tsuyoshi Sawada) almost 12 years ago

=begin
The problem seems to happen with combination of a certain token, ?, and \z.

"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5

=end

Updated by sawa (Tsuyoshi Sawada) almost 12 years ago

Is this bug report wrong? If so, please note so.

Updated by naruse (Yui NARUSE) almost 12 years ago

  • Category set to M17N
  • Status changed from Open to Assigned
  • Assignee set to naruse (Yui NARUSE)
  • Target version set to 2.1.0

sawa (Tsuyoshi Sawada) wrote:

Is this bug report wrong? If so, please note so.

This looks really bug of oniguruma/onigmo.

Updated by acheong87 (Andrew Cheong) almost 12 years ago

Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.

Updated by k_takata (Ken Takata) almost 12 years ago

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

Updated by sawa (Tsuyoshi Sawada) almost 12 years ago

Is either of k_takata's bug fix going to be incorporated?

Updated by naruse (Yui NARUSE) almost 12 years ago

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

k_takata (Ken Takata) wrote:

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.

I think -1 is suitable because it looks to keep original intention more than -2.

Updated by k_takata (Ken Takata) almost 12 years ago

I think -1 is suitable because it looks to keep original intention more than -2.

Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee

I also attach an updated patch so that can be applied to Ruby 1.9.3.

Actions #12

Updated by naruse (Yui NARUSE) almost 12 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r40276.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
    [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
Actions #13

Updated by naruse (Yui NARUSE) almost 12 years ago

  • Tracker changed from Bug to Backport
  • Project changed from Ruby master to Backport200
  • Category deleted (M17N)
  • Status changed from Closed to Assigned
  • Assignee changed from naruse (Yui NARUSE) to nagachika (Tomoyuki Chikanaga)
  • Target version deleted (2.1.0)

Updated by k_takata (Ken Takata) almost 12 years ago

I think it's better to backport this patch to Ruby 1.9.3 too.

Actions #15

Updated by nagachika (Tomoyuki Chikanaga) almost 12 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40384.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


merge revision(s) 40276: [Backport #8210]

* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
  [bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
Actions #16

Updated by nagachika (Tomoyuki Chikanaga) almost 12 years ago

  • Project changed from Backport200 to Backport193
  • Status changed from Closed to Assigned
  • Assignee changed from nagachika (Tomoyuki Chikanaga) to usa (Usaku NAKAMURA)

Move to Backport93.
But Onigmo is merged after 2.0. I didn't confirm this patch can merge to ruby_1_9_3...

Actions #17

Updated by usa (Usaku NAKAMURA) over 11 years ago

  • Status changed from Assigned to Closed

This issue was solved with changeset r40713.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • regexec.c (onig_search): fix problem with optimization of \z.
    [Backport #8210]
    patched by k_tanaka at [ruby-core:54251].

Updated by k_takata (Ken Takata) over 11 years ago

Hi usa,

  • regexec.c (onig_search): fix problem with optimization of \z.
    [Backport #8210]
    patched by k_tanaka at [ruby-core:54251].

Thank you for merging my patch.
BTW, my name is not tanaka...

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0