Project

General

Profile

Bug #8129

String#index has drastically different performance when a single unicode character is included

Added by zmoazeni (Zach Moazeni) about 7 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
ruby -v:
2.0.0-p0
Backport:
[ruby-core:53559]

Description

=begin
I created a simple ruby script:

#! /usr/bin/env ruby

raise "need a file name" unless ARGV[0]
contents = File.read(ARGV[0])

326_000.times do |i|
contents[(i + 23) % contents.size]
end

And I uploaded two files below. One is all ASCII characters and the other has a single Unicode character in the first line (an "em dash").

String#index has dramatically different performance for the two strings. Locally, I'm seeing ~1.5 seconds with all_ascii.css and ~30 seconds with one_unicode.css on 1.9.3-p385. It gets worse with ruby 2.0, all_ascii.css still takes ~1 sec, but one_unicode.css takes ~2.5 minutes!

Any idea why the performance is so dramatically different between the two?
=end


Files

all_ascii.css (193 KB) all_ascii.css zmoazeni (Zach Moazeni), 03/20/2013 08:23 AM
one_unicode.css (193 KB) one_unicode.css The first line contains a unicode "em dash", otherwise all ascii zmoazeni (Zach Moazeni), 03/20/2013 08:23 AM

Also available in: Atom PDF