Project

General

Profile

Feature #18822

Updated by byroot (Jean Boussier) almost 2 years ago

### Context 

 There are two fairly similar encoding methods that are often confused.  

 `application/x-www-form-urlencoded` which is how form data is encoded, and "percent-encoding" as defined by [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986). 

 AFAIK, the only way they differ is that "form encoding" escape space characters as `+`, and RFC 3986 escape them as `%20`. Most of the time it doesn't matter, but sometimes it does. 

 ### Ruby form and URL escape methods 

   - `URI.escape(" ") # => "%20"` but it was deprecated and removed (in 3.0 ?). 
   - `ERB::Util.url_encode(" ") # => "%20"` but it's implemented with a `gsub` and isn't very performant. It's also awkward to have to reach for `ERB` 
   - `CGI.escape(" ") # => "+"` 
   - `URI.encode_www_form_component(" ") # => "+"` 

 ### Unescape methods 

 For unescaping, it's even more of a clear cut since `URI.unescape` was removed. So there's no available method that won't treat an unescaped `+` as simply `+`. 

 e.g. in Javascript: `decodeURIComponent("foo+bar") #=> "foo+bar"`. 

 If one were to use `CGI.unescape`, the string might be improperly decoded: `GI.unescape("foo+bar") #=> "foo bar"`.  

 ### Other languages 

   - Javascript `encodeURI` and `encodeURIComponent` use `%20`. 
   - PHP has `urlencode` using `+` and `rawurlencode` using `%20`. 
   - Python has `urllib.parse.quote` using `%20` and `urllib.parse.quote_plus` using `+`. 

 ### Proposal 

 Since `CGI` already have a very performant encoder for `application/x-www-form-urlencoded`, I think it would make sense that it would provide another method for RFC3986. 

 I propose: 

    - `CGI.url_encode(" ") # => "%20"` 
    - Or `CGI.encode_url`. 
    - Alias `CGI.escape` as `GCI.encode_www_form_component` 
    - Clarify the documentation of `CGI.escape`. 

Back