Backport #8210
closedMultibyte character interfering with end-line character within a regex
Description
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., $/
is "\n"
). I expect them to behave the same, and believe this is a bug.
=end
Files
Updated by sawa (Tsuyoshi Sawada) over 11 years ago
=begin
A different regex:
regex4 = /[[:space:]]?\z/
seems to work as expected:
"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5
=end
Updated by sawa (Tsuyoshi Sawada) over 11 years ago
=begin
Still a different regex:
regex5 = /\n?$/
seems to work as expected:
"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5
=end
Updated by sawa (Tsuyoshi Sawada) over 11 years ago
=begin
The problem seems to happen with combination of a certain token, ?
, and \z
.
"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5
=end
Updated by sawa (Tsuyoshi Sawada) over 11 years ago
Is this bug report wrong? If so, please note so.
Updated by naruse (Yui NARUSE) over 11 years ago
- Category set to M17N
- Status changed from Open to Assigned
- Assignee set to naruse (Yui NARUSE)
- Target version set to 2.1.0
sawa (Tsuyoshi Sawada) wrote:
Is this bug report wrong? If so, please note so.
This looks really bug of oniguruma/onigmo.
Updated by acheong87 (Andrew Cheong) over 11 years ago
Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.
Updated by rondinif (Franco Rondini) over 11 years ago
Just edited the answer and test code available
Updated by k_takata (Ken Takata) over 11 years ago
- File fix-8210-1.diff fix-8210-1.diff added
- File fix-8210-2.diff fix-8210-2.diff added
This problem was caused by optimization of \z.
I wrote two patches to fix this problem.
Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.
Updated by sawa (Tsuyoshi Sawada) over 11 years ago
Is either of k_takata's bug fix going to be incorporated?
Updated by naruse (Yui NARUSE) over 11 years ago
k_takata (Ken Takata) wrote:
This problem was caused by optimization of \z.
I wrote two patches to fix this problem.Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.
k_takata (Ken Takata) wrote:
This problem was caused by optimization of \z.
I wrote two patches to fix this problem.Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.
I think -1 is suitable because it looks to keep original intention more than -2.
Updated by k_takata (Ken Takata) over 11 years ago
- File fix-8210-1-update.diff fix-8210-1-update.diff added
I think -1 is suitable because it looks to keep original intention more than -2.
Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee
I also attach an updated patch so that can be applied to Ruby 1.9.3.
Updated by naruse (Yui NARUSE) over 11 years ago
- Status changed from Assigned to Closed
- % Done changed from 0 to 100
Updated by naruse (Yui NARUSE) over 11 years ago
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport200
- Category deleted (
M17N) - Status changed from Closed to Assigned
- Assignee changed from naruse (Yui NARUSE) to nagachika (Tomoyuki Chikanaga)
- Target version deleted (
2.1.0)
Updated by k_takata (Ken Takata) over 11 years ago
I think it's better to backport this patch to Ruby 1.9.3 too.
Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago
- Status changed from Assigned to Closed
This issue was solved with changeset r40384.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
merge revision(s) 40276: [Backport #8210]
* Merge Onigmo 5.13.4 f22cf2e566712cace60d17f84d63119d7c5764ee.
[bug] fix problem with optimization of \z (Issue #16) [Bug #8210]
Updated by nagachika (Tomoyuki Chikanaga) over 11 years ago
- Project changed from Backport200 to Backport193
- Status changed from Closed to Assigned
- Assignee changed from nagachika (Tomoyuki Chikanaga) to usa (Usaku NAKAMURA)
Move to Backport93.
But Onigmo is merged after 2.0. I didn't confirm this patch can merge to ruby_1_9_3...
Updated by usa (Usaku NAKAMURA) over 11 years ago
- Status changed from Assigned to Closed
This issue was solved with changeset r40713.
Tsuyoshi, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- regexec.c (onig_search): fix problem with optimization of \z.
[Backport #8210]
patched by k_tanaka at [ruby-core:54251].
Updated by k_takata (Ken Takata) over 11 years ago
Hi usa,
- regexec.c (onig_search): fix problem with optimization of \z.
[Backport #8210]
patched by k_tanaka at [ruby-core:54251].
Thank you for merging my patch.
BTW, my name is not tanaka...