This is not a bug, and not related to validation. The reason for the behavior is that URI.parse
uses an RFC 3986 parser, while URI::HTTPS.build
uses an RFC 2396 parser. If you use URI::HTTPS.new
with an RFC 3986 parser and specify to validate the components, you get a valid URI:
URI::HTTPS.new(
*URI::RFC3986_PARSER.split(
"https://-._~%2C!$&'()*+,;=:@-._~%2C!$&'()*+,;=:/foo?/-._~%2C!$&'()*+,;=:@/?"),
URI::RFC3986_PARSER, true)
The issue here is that the hostname you provide in the URI is invalid in RFC 2396 but valid in RFC 3986.
RFC 2396 ABNF:
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
RFC 3986 ABNF:
host = IP-literal / IPv4address / reg-name
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
With the URI provided, the host is -._~%2C!$&'()*+,;=
, which is valid according to the RFC 3986 ABNF:
- : unreserved
. : unreserved
_ : unreserved
~ : unreserved
%2C : pct-encoded
! : sub-delims
$ : sub-delims
& : sub-delims
' : sub-delims
( : sub-delims
) : sub-delims
* : sub-delims
+ : sub-delims
, : sub-delims
; : sub-delims
= : sub-delims
As to why RFC 3986 is used in some places (parse/join/split) and RFC 2396 (all other places) is used in others, I believe it is related to backwards compatibility. Previously, There were some issues with [
and ]
not being allowed in query parts in RFC 3986 (#10402), but those are now worked around. However, URI::RFC2396_Parser
and URI::RFC3986_Parser
are not API compatible, so you cannot simply swap one for the other without breaking things.
In case you or someone else is interested in changing the default parser, attached is a minimal patch to make the RFC 3986 parser the default. It passes the URI tests, but I haven't done any testing beyond that. Hopefully it provides a decent starting point.