Feature #18822
closedRuby lack a proper method to percent-encode strings for URIs (RFC 3986)
Description
Context¶
There are two fairly similar encoding methods that are often confused.
application/x-www-form-urlencoded
which is how form data is encoded, and "percent-encoding" as defined by RFC 3986.
AFAIK, the only way they differ is that "form encoding" escape space characters as +
, and RFC 3986 escape them as %20
. Most of the time it doesn't matter, but sometimes it does.
Ruby form and URL escape methods¶
-
URI.escape(" ") # => "%20"
but it was deprecated and removed (in 3.0 ?). -
ERB::Util.url_encode(" ") # => "%20"
but it's implemented with agsub
and isn't very performant. It's also awkward to have to reach forERB
CGI.escape(" ") # => "+"
URI.encode_www_form_component(" ") # => "+"
Unescape methods¶
For unescaping, it's even more of a clear cut since URI.unescape
was removed. So there's no available method that won't treat an unescaped +
as simply +
.
e.g. in Javascript: decodeURIComponent("foo+bar") #=> "foo+bar"
.
If one were to use CGI.unescape
, the string might be improperly decoded: GI.unescape("foo+bar") #=> "foo bar"
.
Other languages¶
- Javascript
encodeURI
andencodeURIComponent
use%20
. - PHP has
urlencode
using+
andrawurlencode
using%20
. - Python has
urllib.parse.quote
using%20
andurllib.parse.quote_plus
using+
.
Proposal¶
Since CGI
already have a very performant encoder for application/x-www-form-urlencoded
, I think it would make sense that it would provide another method for RFC3986.
I propose:
CGI.url_encode(" ") # => "%20"
- Or
CGI.encode_url
. - Alias
CGI.escape
asGCI.encode_www_form_component
- Clarify the documentation of
CGI.escape
.