Project

General

Profile

Bug #12562

URI merge removes empty segment contrary to RFC 3986

Added by john_elrick (John Elrick) about 4 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:76281]

Description

Background

This bug surfaced while testing against a server whose router expects an empty segment. The server routing is not correct, however, the result exposed the discontinuity compared to the specifications in RFC 3986.

Description

URI.parse('https://www.example.com').merge('/foo//bar')

according to RFC 3986, this should result in:

https://www.example.com/foo//bar

However, the result of the operation is

https://www.example.com/foo/bar

The cause of the problem appears to be in uri/generic.rb at line 1080:

    def split_path(path)
      path.split(%r{/+}, -1)
    end

The addition of the + operator causes the split to ignore the second solidus instead of reporting it as an empty segment. In normal operations, the extra solidus is ignored by the server, however, in this particular case the absence of the second solidus resulted in a 500 error.

RFC 3986

Appendix B of RFC 3986 demonstrates the following Regular Expression (RFC 3986 Page 51):

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Processing this expression against the following demonstrates the differences between the spec and the operation of URI.split_path:

[1] pry(#<URI::HTTPS>)> path
=> "/foo//bar"
[2] pry(#<URI::HTTPS>)> path.split(%r{/+}, -1)
=> ["", "foo", "bar"]
[3] pry(#<URI::HTTPS>)> path
=> "/foo//bar"
[4] pry(#<URI::HTTPS>)> path.scan(%r{^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?})
=> [[nil, nil, nil, nil, "/foo//bar", nil, nil, nil, nil]]
[5] pry(#<URI::HTTPS>)> path = 'https://gollum:precious@www.hobbits.com/bilbo/baggins//frodo/baggins'
=> "https://gollum:precious@www.hobbits.com/bilbo/baggins//frodo/baggins"
[6] pry(#<URI::HTTPS>)> path.scan(%r{^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?})
=> [["https:", "https", "//gollum:precious@www.hobbits.com", "gollum:precious@www.hobbits.com", "/bilbo/baggins//frodo/baggins", nil, nil, nil, nil]]

In both cases, the double solidus is retained.

Additionally, the RFC ABNF syntax allows for an empty segment:

   segment       = *pchar
   segment-nz    = 1*pchar
   segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                 ; non-zero-length segment without any colon ":"

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Recommendation

Modify uri/generic.rb to read:

    def split_path(path)
      path.split(%r{/}, -1)
    end

Related issues

Is duplicate of Ruby master - Bug #8352: URI squeezes a sequence of slashes in merging paths when it shouldn'tClosednaruse (Yui NARUSE)Actions
#1

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Is duplicate of Bug #8352: URI squeezes a sequence of slashes in merging paths when it shouldn't added
#2

Updated by jeremyevans0 (Jeremy Evans) about 1 year ago

  • Status changed from Open to Closed

Also available in: Atom PDF