Feature #17115: Optimize String#casecmp? for ASCII strings - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #17115

open

Optimize String#casecmp? for ASCII strings

Feature #17115: Optimize String#casecmp? for ASCII strings

Added by byroot (Jean Boussier) over 5 years ago. Updated over 5 years ago.

Status:

Open

Assignee:

Target version:

[ruby-core:99559]

Description

Patch: https://github.com/ruby/ruby/pull/3369

casecmp? is a kind of performance trap as it's much slower than using a case insensitive regexp or just casecmp == 0.

str = "Connection"
cmp = "connection"
Benchmark.ips do |x|
  x.report('/\A\z/i.match?') { /\Afoo\Z/i.match?(str) }
  x.report('casecmp?') { cmp.casecmp?(str) }
  x.report('casecmp') { cmp.casecmp(str) == 0 }
  x.compare!
end
Calculating -------------------------------------
      /\A\z/i.match?     11.447M (± 1.3%) i/s -     57.814M in   5.051489s
            casecmp?      6.197M (± 0.9%) i/s -     31.138M in   5.025252s
             casecmp     12.753M (± 1.2%) i/s -     64.636M in   5.069195s

Comparison:
             casecmp: 12752791.6 i/s
      /\A\z/i.match?: 11446996.1 i/s - 1.11x  (± 0.00) slower
            casecmp?:  6196886.0 i/s - 2.06x  (± 0.00) slower

This is because, unlike the others, it is sensitive to unicode case folding.

However, there are cases where fast case insensitive equality check of known ASCII strings is useful. For instance, matching HTTP headers.

This patch checks if both strings use a single byte encoding, and if so then does a simple iterative comparison with TOLOWER(). This makes casecmp? slightly faster than casecmp == 0 when both strings are ASCII.

|                        |compare-ruby|built-ruby|
|:-----------------------|-----------:|---------:|
|casecmp-1               |     11.618M|   10.757M|
|                        |       1.08x|         -|
|casecmp-10              |      1.849M|    1.723M|
|                        |       1.07x|         -|
|casecmp-100             |    204.490k|  186.798k|
|                        |       1.09x|         -|
|casecmp-1000            |     20.413k|   20.184k|
|                        |       1.01x|         -|
|casecmp-nonascii1       |     19.541M|   20.100M|
|                        |           -|     1.03x|
|casecmp-nonascii10      |     19.489M|   19.914M|
|                        |           -|     1.02x|
|casecmp-nonascii100     |     19.479M|   20.155M|
|                        |           -|     1.03x|
|casecmp-nonascii1000    |     19.462M|   20.064M|
|                        |           -|     1.03x|
|casecmp_p-1             |      2.214M|   12.030M|
|                        |           -|     5.43x|
|casecmp_p-10            |      1.373M|    2.150M|
|                        |           -|     1.57x|
|casecmp_p-100           |    249.292k|  231.041k|
|                        |       1.08x|         -|
|casecmp_p-1000          |     16.173k|   23.592k|
|                        |           -|     1.46x|
|casecmp_p-nonascii1     |    651.921k|  650.572k|
|                        |       1.00x|         -|
|casecmp_p-nonascii10    |    108.253k|  109.006k|
|                        |           -|     1.01x|
|casecmp_p-nonascii100   |     11.749k|   11.889k|
|                        |           -|     1.01x|
|casecmp_p-nonascii1000  |      1.140k|    1.138k|
|

Updated by byroot (Jean Boussier) over 5 years ago Actions
Copy link
#1

Description updated (diff)

Updated by Dan0042 (Daniel DeLorme) over 5 years ago Actions
Copy link
#2 [ruby-core:99588]

In the benchmark you'd need to change the regexp from /\Afoo\Z/i to /\Aconnection\z/i; if you do so you'll find the regexp performance is similar to casecmp?

+1 for special-casing ASCII strings though.

Related: #13750, #14055

Updated by sawa (Tsuyoshi Sawada) over 5 years ago Actions
Copy link
#3

Description updated (diff)

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #17115

Optimize String#casecmp? for ASCII strings

Updated by byroot (Jean Boussier) over 5 years ago Actions
Copy link
#1

Updated by Dan0042 (Daniel DeLorme) over 5 years ago Actions
Copy link
#2 [ruby-core:99588]

Updated by sawa (Tsuyoshi Sawada) over 5 years ago Actions
Copy link
#3

Project

General

Profile

Ruby

Custom queries

Feature #17115

Optimize String#casecmp? for ASCII strings

Updated by byroot (Jean Boussier) over 5 years ago ActionsCopy link #1

Updated by Dan0042 (Daniel DeLorme) over 5 years ago ActionsCopy link #2 [ruby-core:99588]

Updated by sawa (Tsuyoshi Sawada) over 5 years ago ActionsCopy link #3

Updated by byroot (Jean Boussier) over 5 years ago Actions
Copy link
#1

Updated by Dan0042 (Daniel DeLorme) over 5 years ago Actions
Copy link
#2 [ruby-core:99588]

Updated by sawa (Tsuyoshi Sawada) over 5 years ago Actions
Copy link
#3