Bug #3457
URI.encode does not escape square brackets
| Status: | Rejected | Start date: | 06/20/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | lib | |||
| Target version: | - | |||
| ruby -v: | ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10.0.0] |
Description
According to RFC 3986 URIs may only contain square brackets to enclose IPv6 addresses within the host part of the URI. In other parts of the uri the are not allowed and need to be escaped.
> A host identified by an Internet Protocol literal address, version 6
> [RFC3513] or later, is distinguished by enclosing the IP literal
> within square brackets ("[" and "]"). This is the only place where
> square bracket characters are allowed in the URI syntax.
> Square brackets are now specified as reserved within the
> authority component and are not allowed outside their use as
> delimiters for an IP literal within host.
The attached test case should test the desired behavior. I've tested it on 1.8.7 but I a would guess, that it is failing for all major ruby versions and interpreters.
I'm not sure, how this should be implemented while keeping the API of URI.escape in tact. The escaping needs to use two groups of safe characters, one for the host part and one for the rest. This would result in 2 optional parameters for URI.escape to provide custom safe characters for each part.
Related issues
| duplicated by Backport87 - RubySpec #3692: URI should escape or parse [ and ] | Rejected | 08/14/2010 |
History
Updated by Yui NARUSE over 1 year ago
- Status changed from Open to Rejected
URI.encode doesn't parse the structure of given URI. So current behavior is spec.
Updated by Gregor Schmidt over 1 year ago
I am aware, that URI.encode currently does not analyze the URI's structure, but how is this related to being spec compliant. According to the RFC square brackets are not allowed in most parts of the URI, but they currently are not escaped. I would expect, that URI.escape replaces all illegal characters with a safe escape sequence, just like the RDoc states: 'Escapes the string, replacing all unsafe characters with codes'.
Updated by Roger Neel over 1 year ago
I want to throw my hat in the ring that this should be reopened. It's not the expected behavior of URI.encode to not handle 'unsafe' characters as laid out here: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm I just had a bad production bug of IE & Chrome not being able to download files with [ and ] in them. Our solution was to hack a .gsub(/(\[|\])/,'') in to remove the square brackets after encoding the URL path (not host). Also, if not removing square brackets from the host & path is not expected in IPv4, wouldn't that make this *not* IPv4 compliant in lieu of the changes in IPv6?
Updated by Yusuke Endoh over 1 year ago
Hi, 2010/7/31 Roger Neel <redmine@ruby-lang.org>: > I want to throw my hat in the ring that this should be reopened. ?It's not the expected behavior of URI.encode to not handle 'unsafe' characters as laid out here: I do see your point, but we can't change the behavior because of compatibility reason. I recommend you to propose new method for the new behavior, or to wait for 2.0. -- Yusuke Endoh <mame@tsg.ne.jp>
Updated by Akira Tanaka over 1 year ago
2010/7/31 Roger Neel <redmine@ruby-lang.org>: > Issue #3457 has been updated by Roger Neel. > > > I want to throw my hat in the ring that this should be reopened. It's not the expected behavior of URI.encode to not handle 'unsafe' characters as laid out here: > http://www.blooberry.com/indexdot/html/topics/urlencoding.htm Don't use URI.encode. What charaters should be encoded are context sensitive but URI.encode is context insensitive. "[" should be percent-encoded except for IPv6 addresses. URI should be composed by concatenating encoded components with delimiters. URI.encode which encode a whole URI is the just a wrong way to composing a URI. Don't use it. -- Tanaka Akira
Updated by Shyouhei Urabe over 1 year ago
> Don't use URI.encode. Then should it issue a warning at least?
Updated by Akira Tanaka over 1 year ago
2010/8/2 Shyouhei Urabe <redmine@ruby-lang.org>:
>
>> Don't use URI.encode.
>
> Then should it issue a warning at least?
% ruby -ruri -ve 'URI.encode("a")'
ruby 1.9.3dev (2010-06-10 trunk 28257) [i686-linux]
-e:1:in `<main>': warning: URI.escape is obsolete
--
Tanaka Akira
Updated by Martin Dürst over 1 year ago
On 2010/08/02 11:47, Tanaka Akira wrote:
> 2010/8/2 Shyouhei Urabe<redmine@ruby-lang.org>:
>>
>>> Don't use URI.encode.
>>
>> Then should it issue a warning at least?
>
> % ruby -ruri -ve 'URI.encode("a")'
> ruby 1.9.3dev (2010-06-10 trunk 28257) [i686-linux]
> -e:1:in `<main>': warning: URI.escape is obsolete
Could this be expanded to say something like "warning: URI.escape is
obsolete, please use FOO"?
Regards, Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Updated by Akira Tanaka over 1 year ago
2010/8/2 "Martin J. Dürst" <duerst@it.aoyama.ac.jp>: > Could this be expanded to say something like "warning: URI.escape is > obsolete, please use FOO"? Do you have an idea for FOO? -- Tanaka Akira
Updated by Alex Neth 9 months ago
This is quite confusing. If one has, for instance, a filename with square brackets, that must be encoded in the URI. URI.encode seems to be the way to do this, yet it generates an invalid URI if performed on the filename. It seems there should be a method that escapes all reserved characters in addition to invalid ones, perhaps like the javascript uncodeURIComponent, something like encode_component
Updated by Alex Neth 9 months ago
For those who wish to encode a URI segment, the way to do this is:
URI.encode("some weird filename with []", /[^#{URI::REGEXP::PATTERN::UNRESERVED}]/)
This uses a custom regexp which encodes all non-reserved characters.
The default encodes all characters that are NEITHER non-reserved or reserved, which is a pretty useless behavior unless you want to do really sloppy encoding.
Note that this will also encode slashes, so it will not behave well with full paths.