Project

General

Profile

Actions

Bug #20686

open

URI::HTTPS can build URI with blank, invalid host

Added by ronricardo (Roniece Ricardo) 5 months ago. Updated 5 months ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:118894]

Description

In Ruby 3.4.0+, calling URI::HTTPS.build(host: "") does not raise URI::InvalidComponentError as expected. Instead, it returns #<URI::HTTPS https://>

I think this was introduced in this PR.

Steps to Reproduce

1. Environment:

  • Ruby Version: 3.4.0+

2. Steps:

  • Open an IRB session.

  • Run:

    URI::HTTPS.build(host: "")
    

3. Expected Behavior:

  • URI::InvalidComponentError should be raised due to the invalid empty host component.

4. Actual Behavior:

  • Returns #<URI::HTTPS https://> without raising an error.

Ruby 3.1.4:

irb(main):008:0> RUBY_VERSION
=> "3.1.4"
irb(main):009:0> URI::HTTPS.build(host:"")
/home/vscode/.rbenv/versions/3.1.4/lib/ruby/3.1.0/uri/generic.rb:601:in `check_host': bad component(expected host component):  (URI::InvalidComponentError)

Ruby 3.4.0:

irb():015> RUBY_VERSION
=> "3.4.0"
irb(...):016> URI::HTTPS.build(host:"")
=> #<URI::HTTPS https://>

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

ronricardo (Roniece Ricardo) wrote:

In Ruby 3.4.0+, calling URI::HTTPS.build(host: "") does not raise URI::InvalidComponentError as expected. Instead, it returns #<URI::HTTPS https://>

I think this was introduced in this PR.

That PR only affects #to_s, not .build, and is unrelated. This was caused by the RFC 2396 -> RFC 3986 parser change:

URI::HTTPS.new(nil, nil, "", nil, nil, nil, nil, nil, nil, URI::RFC3986_PARSER, true) 
# => #<URI::HTTPS //>

URI::HTTPS.new(nil, nil, "", nil, nil, nil, nil, nil, nil, URI::RFC2396_PARSER, true)
# /home/jeremy/tmp/uri/lib/uri/generic.rb:601:in `check_host': bad component(expected host component):  (URI::InvalidComponentError)

It appears RFC 3986 allows empty hosts (https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2: reg-name = *( unreserved / pct-encoded / sub-delims )), so I think this is not a bug, but an expected behavior change.

Updated by jhawthorn (John Hawthorn) 5 months ago

jeremyevans0 (Jeremy Evans) wrote in #note-1:

It appears RFC 3986 allows empty hosts (https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2: reg-name = *( unreserved / pct-encoded / sub-delims )), so I think this is not a bug, but an expected behavior change.

It's allowed by the ABNF, but the next paragraph states that it isn't valid for HTTP/HTTPS URIs

If the URI scheme defines a default for host, then that default
applies when the host subcomponent is undefined or when the
registered name is empty (zero length). For example, the "file" URI
scheme is defined so that no authority, an empty host, and
"localhost" all mean the end-user's machine, whereas the "http"
scheme considers a missing authority or empty host invalid.

Updated by jhawthorn (John Hawthorn) 5 months ago

Interestingly RFC2396_PARSER seems to allow nil for a host but not empty string, so the newer behaviour is at least more consistent. It does seem like we are missing some expected validation here though.

> URI::HTTPS.new(nil, nil, "", nil, nil, nil, nil, nil, nil, URI::RFC2396_Parser.new, true)
/Users/jhawthorn/.rubies/ruby-3.3.2/lib/ruby/3.3.0/uri/generic.rb:601:in `check_host': bad component(expected host component):  (URI::InvalidComponentError)
        from /Users/jhawthorn/.rubies/ruby-3.3.2/lib/ruby/3.3.0/uri/generic.rb:640:in `host='
        ...
>> URI::HTTPS.new(nil, nil, nil, nil, nil, nil, nil, nil, nil, URI::RFC2396_Parser.new, true)
=> #<URI::HTTPS >

Updated by jeremyevans0 (Jeremy Evans) 5 months ago

jhawthorn (John Hawthorn) wrote in #note-2:

jeremyevans0 (Jeremy Evans) wrote in #note-1:

It appears RFC 3986 allows empty hosts (https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2: reg-name = *( unreserved / pct-encoded / sub-delims )), so I think this is not a bug, but an expected behavior change.

It's allowed by the ABNF, but the next paragraph states that it isn't valid for HTTP/HTTPS URIs

If the URI scheme defines a default for host, then that default
applies when the host subcomponent is undefined or when the
registered name is empty (zero length). For example, the "file" URI
scheme is defined so that no authority, an empty host, and
"localhost" all mean the end-user's machine, whereas the "http"
scheme considers a missing authority or empty host invalid.

Thank you for pointing that out. I obviously should have read a little further. I submitted a pull request to reject empty host for URI::HTTP{,S}: https://github.com/ruby/uri/pull/116

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0