https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112022-11-04T07:04:14ZRuby Issue Tracking SystemRuby master - Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engineshttps://bugs.ruby-lang.org/issues/19102?journal_id=999342022-11-04T07:04:14Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/99934/diff?detail_id=63428">diff</a>)</li></ul> Ruby master - Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engineshttps://bugs.ruby-lang.org/issues/19102?journal_id=999352022-11-04T07:13:59Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="[ruby/erb] Optimize away to_s if it's already T_STRING [Feature #19102]https://github.com/ruby/e..." href="https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/20efeaddbe246f3b2eaee4f17f54a814777176a8">git|20efeaddbe246f3b2eaee4f17f54a814777176a8</a>.</p>
<hr>
<p>[ruby/erb] Optimize away to_s if it's already T_STRING</p>
<p>[Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines (Closed)" href="https://bugs.ruby-lang.org/issues/19102">#19102</a>]<a href="https://github.com/ruby/erb/commit/38c6e182fb" class="external">https://github.com/ruby/erb/commit/38c6e182fb</a></p> Ruby master - Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engineshttps://bugs.ruby-lang.org/issues/19102?journal_id=1002722022-11-26T10:22:07ZEregon (Benoit Daloze)
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/19090">Feature #19090</a>: Do not duplicate an unescaped string in CGI.escapeHTML</i> added</li></ul> Ruby master - Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engineshttps://bugs.ruby-lang.org/issues/19102?journal_id=1002752022-11-26T13:03:13ZEregon (Benoit Daloze)
<ul></ul><p>I think it is unfortunate to add a C extension for ERB for that, ERB was always pure-Ruby and that was nice.</p>
<p>Also the C extension is slower on TruffleRuby, the Regexp is actually JIT-compiled and can use vectorization, unlike that C code. Also part of it is RSTRING_PTR() basically forces a copy from managed memory (byte[]) to native memory (char*) on TruffleRuby.</p>
<pre><code>truffleruby 23.0.0-dev-57e53f8a, like ruby 3.1.2, GraalVM CE Native [x86_64-linux]
CGI.escapeHTML 31.985M (± 1.2%) i/s - 160.093M in 5.006001s
ERB::Util.html_escape 7.427M (± 3.3%) i/s - 37.162M in 5.009721s
</code></pre>
<p>and CRuby 3.1 is:</p>
<pre><code>ERB::Util.html_escape 14.551M (± 0.8%) i/s - 73.308M in 5.038335s
CGI.escapeHTML 10.065M (± 0.6%) i/s - 51.054M in 5.072629s
</code></pre>
<p>Given those results, could you build the C extension only for CRuby?</p>
<p>I think it would also be much nicer to keep the optimized HTML escape in CGI which is in stdlib, so it can be used by all templates engines.</p>
<p>In <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Do not duplicate an unescaped string in CGI.escapeHTML (Closed)" href="https://bugs.ruby-lang.org/issues/19090">#19090</a> I did not expect <code>rb_str_dup()</code> is so costly on CRuby, I guess the allocation is slow and of course CRuby can't escape-analyze it.<br>
A new method in CGI sounds best, or probably nicer an optional argument whether to always return a copy.<br>
It seems tricky to change the existing method to return a mutable string as-is, I guess the person who found <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: CGI.escapeHTML should NOT return frozen string (Closed)" href="https://bugs.ruby-lang.org/issues/11858">#11858</a> actually found it due to that incompatibility.<br>
Ruby users seem to assume in general if a core/stdlib method returns a string either it's frozen or they are free to modify it, but here it would actually also mutate the original String passed to <code>CGI.escapeHTML</code>.</p>
<p><code>String#to_s</code> really sounds like something every Ruby JIT/VM should be able to trivially optimize.<br>
So that I think should be an insignificant cost, even on CRuby (and if it isn't it should be easy to fix).</p> Ruby master - Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engineshttps://bugs.ruby-lang.org/issues/19102?journal_id=1002782022-11-26T21:56:28Zk0kubun (Takashi Kokubun)takashikkbn@gmail.com
<ul></ul><p>Would you mind reviewing <a href="https://github.com/ruby/erb/pull/39" class="external">https://github.com/ruby/erb/pull/39</a>? I skipped this ticket's change for truffleruby in the PR.</p>