Project

General

Profile

Actions

Feature #19102

closed

Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines

Added by k0kubun (Takashi Kokubun) about 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:110603]

Description

Proposal

Change the behavior of ERB::Util.html_escape in the following two parts:

  1. Skip converting an argument with #to_s if the argument is already a T_STRING.
  2. Do not allocate and return a new String when nothing needs to be escaped.

Background

The current ERB::Util.html_escape is implemented as CGI.escapeHTML(s.to_s). So the performance is almost equal to CGI.escapeHTML except for the to_s call. Because it's common to embed non-String expressions in template engines, a template engine typically calls to_s to convert non-String expressions to a String before escaping it, which is why the difference exists. Proposal (1) is useful for optimizing the case that something that's already a String is embedded. We ignore the extreme case that String#to_s is weirdly monkey-patched.

As to proposal (2), my original implementation of CGI.escapeHTML https://github.com/ruby/ruby/pull/1164 was not calling rb_str_dup for that case. However, because [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility with the old gsub-based implementation, we added the unneeded rb_str_dup call and the performance for that case has been compromised. This behavior is completely unnecessary for template engines. On the other hand, because ERB::Util.html_escape is a helper for ERB, we should not need to consider any backward compatibility that is not relevant to ERB or any template engines. So proposal (2) should be possible in ERB::Util.html_escape unlike CGI.escapeHTML.

Benchmark

Implementation: https://github.com/ruby/erb/pull/27

require 'benchmark/ips'
require 'erb'

class << ERB::Util
  def html_escape_old(s)
    CGI.escapeHTML(s.to_s)
  end
end

Benchmark.ips do |x|
  s = 'hello world'
  x.report('before') { ERB::Util.html_escape_old(s) }
  x.report('after')  { ERB::Util.html_escape(s) }
  x.compare!
end
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
              before     1.066M i/100ms
               after     1.879M i/100ms
Calculating -------------------------------------
              before     10.615M (± 0.3%) i/s -     53.320M in   5.023083s
               after     18.742M (± 0.4%) i/s -     93.929M in   5.011847s

Comparison:
               after: 18741747.6 i/s
              before: 10615137.1 i/s - 1.77x  (± 0.00) slower

Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTMLClosedActions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0