Feature #19090: Do not duplicate an unescaped string in CGI.escapeHTML - Ruby master - Ruby Issue Tracking System

Actions

Copy link

Feature #19090

closed

Do not duplicate an unescaped string in CGI.escapeHTML

Added by k0kubun (Takashi Kokubun) over 1 year ago. Updated over 1 year ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:110542]

Description

Proposal¶

Stop guaranteeing that GGI.escapeHTML returns a new string even if there's nothing to be escaped.

More specifically, stop calling this rb_str_dup https://github.com/ruby/cgi/blob/v0.3.3/ext/cgi/escape/escape.c#L72 for the case that nothing needs to be escaped.

Background¶

My original implementation https://github.com/ruby/ruby/pull/1164 was not calling it. The reason why rb_str_dup was added was that [Bug #11858] claimed returning the argument object for non-escaped cases is a backward incompatibility because the original gsub-based implementation always returns a new object. As a result, even while many people use CGI.escapeHTML as an optimized implementation for escaping HTML today, it ended up having a compromised performance.

Motivation¶

The motivation is to improve performance. By just doing so, escaping a pre-allocated "string" becomes 1.34x faster on my machine https://gist.github.com/k0kubun/f66d6fe1e6ba821e4263257e504ba28f.

The most major use case of CGP.escapeHTML is to safely embed a user input. When the result is just embedded in another string, the allocated new object will be just wasted. It's pretty common that an embedded string fragment doesn't contain any of '"&<> characters. So we should stop wasting that to optimize that case.

[Bug #11858] wasn't really a use case but just "I think this is backward incompatibility" based on frozen Hello World. Unlike user input, you usually don't need to escape your own string literal. It feels like the ticket addressed a problem that doesn't exist in actual applications. It should have cited existing code that could be broken by that, and I can't find such code with gem-codesearch today.

The only reason to maintain the current behavior would be to allow using a return value of CGI.escapeHTML as a buffer for creating another longer string starting with the escaped value, but using CGI.escapeHTML to initialize a string buffer feels like an abuse. Relying on the behavior never makes sense as an "optimization" either because it makes all other cases (the result is not used as a string buffer) suboptimal.

Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?¶

Two reasons:

The non-dup behavior should be used 99.999..9% of the time. We shouldn't make code using CGI.escapeHTML less readable just for maintaining a use case that doesn't exist.
Passing keyword arguments to a C extension is unfortunately slow, and it defeats the optimization purpose. In core classes, we could use Primitive to address that, but this is a default gem and we can't use that.
- We could workaround that if we choose CGI.escapeHTML(str, false), but again it'd spoil the readability for maintaining an invalid use case.

Why not a new method?¶

It's a good idea actually, but with escapeHTML, escape_html, and h aliased to it already, I can't think of a good name for it. And again, not calling it escapeHTML or escape_html would spoil the readability for no valid reason.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby » Ruby master

Custom queries

Feature #19090

Do not duplicate an unescaped string in CGI.escapeHTML

Proposal¶

Background¶

Motivation¶

Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?¶

Why not a new method?¶

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Project

General

Profile

Ruby » Ruby master

Custom queries

Feature #19090

Do not duplicate an unescaped string in CGI.escapeHTML

Proposal¶

Background¶

Motivation¶

Why not an optional flag like CGI.escapeHTML(str, dup: false)?¶

Why not a new method?¶

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Dan0042 (Daniel DeLorme) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Updated by k0kubun (Takashi Kokubun) over 1 year ago

Updated by Eregon (Benoit Daloze) over 1 year ago

Why not an optional flag like `CGI.escapeHTML(str, dup: false)`?¶