Project

General

Profile

Actions

Feature #18822

closed

Ruby lack a proper method to percent-encode strings for URIs (RFC 3986)

Added by byroot (Jean Boussier) over 2 years ago. Updated about 1 year ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:108822]

Description

Context

There are two fairly similar encoding methods that are often confused.

application/x-www-form-urlencoded which is how form data is encoded, and "percent-encoding" as defined by RFC 3986.

AFAIK, the only way they differ is that "form encoding" escape space characters as +, and RFC 3986 escape them as %20. Most of the time it doesn't matter, but sometimes it does.

Ruby form and URL escape methods

  • URI.escape(" ") # => "%20" but it was deprecated and removed (in 3.0 ?).
  • ERB::Util.url_encode(" ") # => "%20" but it's implemented with a gsub and isn't very performant. It's also awkward to have to reach for ERB
  • CGI.escape(" ") # => "+"
  • URI.encode_www_form_component(" ") # => "+"

Unescape methods

For unescaping, it's even more of a clear cut since URI.unescape was removed. So there's no available method that won't treat an unescaped + as simply +.

e.g. in Javascript: decodeURIComponent("foo+bar") #=> "foo+bar".

If one were to use CGI.unescape, the string might be improperly decoded: GI.unescape("foo+bar") #=> "foo bar".

Other languages

  • Javascript encodeURI and encodeURIComponent use %20.
  • PHP has urlencode using + and rawurlencode using %20.
  • Python has urllib.parse.quote using %20 and urllib.parse.quote_plus using +.

Proposal

Since CGI already have a very performant encoder for application/x-www-form-urlencoded, I think it would make sense that it would provide another method for RFC3986.

I propose:

  • CGI.url_encode(" ") # => "%20"
  • Or CGI.encode_url.
  • Alias CGI.escape as GCI.encode_www_form_component
  • Clarify the documentation of CGI.escape.
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0