Project

General

Profile

Actions

Bug #3788

closed

URI cannot parse IPv6 addresses propertly

Added by adamm (Adam Majer) over 13 years ago. Updated almost 13 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
Backport:
[ruby-core:32056]

Description

=begin
require 'uri'

u = URI::parse( 'http://[::1]:8080/test' )
u.host
=> "[::1]"

**** THIS SHOULD READ ::1 ****

irb(main):007:0> u.host = '127.0.0.1'
=> "127.0.0.1"
irb(main):008:0> u.host = '::1'
URI::InvalidComponentError: bad component(expected host component): ::1
from /usr/lib/ruby/1.9.1/uri/generic.rb:388:in check_host' from /usr/lib/ruby/1.9.1/uri/generic.rb:402:in host='
from (irb):8
from /usr/bin/irb1.9.1:12:in `'
irb(main):009:0> u.host = 'localhost'
=> "localhost"

If settings host to ::1 doesn't work, then setting it to [::1] should produce ::1 hostname. [::1] clearly would not resolve and hostnames should resolve.
=end

Actions #1

Updated by naruse (Yui NARUSE) over 13 years ago

  • Status changed from Open to Rejected

=begin
uri lib's structure follows URI RFC; your expectation is wrong.
=end

Actions #2

Updated by darix (Marcus Rückert) over 13 years ago

=begin
require 'uri'
require 'net/http'
url = URI.parse( 'http://[::1]:8080/test/Test' )
Net::HTTP.get( url )

I think something like that should work then, but it doesnt. it gives:

/usr/lib64/ruby/1.9.1/net/http.rb:644:in initialize': getaddrinfo: Name or service not known (SocketError) /usr/lib64/ruby/1.8/net/http.rb:560:in initialize': getaddrinfo: Name or service not known (SocketError)

Or do we expect users to manually mangle the hostname after the the uri lib parsed it?
=end

Actions #3

Updated by naruse (Yui NARUSE) over 13 years ago

=begin
The machine disables IPv6.
=end

Actions #4

Updated by darix (Marcus Rückert) over 13 years ago

=begin
no it doesnt.

if i use ipv6-localhost instead of [::1] it works.

$ cat tt.rb
#!/usr/bin/ruby

vim: set sw=2 sts=2 et tw=80 :

require 'uri'
require 'net/http'
%w{[::1] ipv6-localhost}.each do |hostname|
begin
test_url = 'http://' + hostname + ':80/test/Test'
puts test_url
url = URI.parse( test_url )
puts url.host
puts Net::HTTP.get( url )
rescue Exception => ex
p ex
end
end

$ ruby -v tt.rb
ruby 1.8.7 (2010-06-23 patchlevel 299) [x86_64-linux]
http://[::1]:80/test/Test
[::1]
#<SocketError: getaddrinfo: Name or service not known>
http://ipv6-localhost:80/test/Test
ipv6-localhost

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
  <body>
  <h1>404 - Not Found</h1>
 </body>
</html>

same result with ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]

=end

Actions #5

Updated by naruse (Yui NARUSE) over 13 years ago

=begin
Hmm, your point has a reson. I consider this.
http://msdn.microsoft.com/en-us/library/system.uri.dnssafehost.aspx
=end

Actions #6

Updated by darix (Marcus Rückert) over 13 years ago

=begin
so can we reopen the bug so it wont get lost?:)
=end

Actions #7

Updated by adamm (Adam Majer) over 13 years ago

=begin
The original argument is quite simple.

  1. URI RFC defines structure of URI (text) so it can be parsed by Ruby's URI (class) and other implementation. RFC define the interop.

  2. Purpose of URI Ruby class is to permit access to read, edit and use such URI transparently. URI class should deal with any encoding/unencoding issues.

  3. Not handling special escaping of the IPv6 literal internally in the URI Ruby class will result in spaghetti code and code duplication elsewhere. Marcus' example is exactly what happens not only inside Ruby codebase, but in other applications. The expectation is a hostname can be passed to other classes. For example,

url = URI.parse( 'http://[::1]:8080/test/Test' )
TCPSocket.new( url.host, url.port )

So where should this be fixed? In every piece of code that uses URI to connect or in URI helper class?

Here's another library (C++) that does this the way I expected it to,
http://doc.trolltech.com/4.6/qurl.html#host

QUrl u( "http://[::1]:8080/test" );
u.host() => "::1"

In my opinion, adding dnssafehost to URI is bad design. Instead that should be the job the DNS class. "DNS Safe" has impact on protocol level, not API level.

My expectation was that URI.host should return unescaped hostname/IP and URI.host= should take unescaped as argument and escape them to produce a valid URI. This interface could then be easily extended to work with IRI (International domains).

maybe the solution is to rename current members host and host= to host_rfc3986 and host_rfc3986=. Similar syntax already exists in other classes like Time.

Alternatively, have URL.host return a new Host class (non-string) that deals with encoding/unencoding per URI RFC.

Of course, it is up to you to decide how you chose to design this. Again, RFC does not define what URL.host returns. Instead it defines host - a specific part of URL string.

=end

Actions #8

Updated by naruse (Yui NARUSE) over 13 years ago

  • Status changed from Rejected to Open

=begin

=end

Actions #9

Updated by akr (Akira Tanaka) over 13 years ago

=begin
2010/9/8 Adam Majer :

Issue #3788 has been updated by Adam Majer.

In my opinion, adding dnssafehost to URI is bad design.

I think adding a method is a good design.

My expectation was that URI.host should return unescaped hostname/IP and URI.host= should take unescaped as argument and escape them to produce a valid URI. This interface could then be easily extended to work with IRI (International domains).

maybe the solution is to rename current members host and host= to host_rfc3986 and host_rfc3986=. Similar syntax already exists in other classes like Time.

It introduces an incompatibility.
Also, the primitive method to get/set the host part will be version dependent.

I think adding a new method for deal brackets is better.

% svn diff --diff-cmd diff -x '-u -p'
Index: lib/uri/generic.rb

--- lib/uri/generic.rb (revision 29195)
+++ lib/uri/generic.rb (working copy)
@@ -412,6 +412,38 @@ module URI
v
end

  • extract the host part of the URI and unwrap brackets for IPv6 addresses.

  • This method is same as URI::Generic#host except

  • brackets for IPv6 (andn future IP) addresses are removed.

  • u = URI("http://[::1]/bar")

  • p u.hostname #=> "::1"

  • p u.host #=> "[::1]"

  • def hostname
  •  v = self.host
    
  •  /\A\[(.*)\]\z/ =~ v ? $1 : v
    
  • end
  • set the host part of the URI as the argument with brackets for

IPv6 addresses.

  • This method is same as URI::Generic#host= except

  • the argument can be bare IPv6 address.

  • u = URI("http://foo/bar")

  • p u.to_s #=> "http://foo/bar"

  • u.hostname = "::1"

  • p u.to_s #=> "http://[::1]/bar"

  • If the arugument seems IPv6 address,

  • it is wrapped by brackets.

  • def hostname=(v)
  •  v = "[#{v}]" if /\A\[.*\]\z/ !~ v && /:/ =~ v
    
  •  self.host = v
    
  • end
  • def check_port(v)
    return v unless v

http://redmine.ruby-lang.org/issues/show/3788
--
Tanaka Akira

=end

Actions #10

Updated by darix (Marcus Rückert) over 13 years ago

=begin
On 2010-09-10 06:44:10 +0900, Tanaka Akira wrote:

2010/9/8 Adam Majer :

Issue #3788 has been updated by Adam Majer.

In my opinion, adding dnssafehost to URI is bad design.

I think adding a method is a good design.

My expectation was that URI.host should return unescaped hostname/IP
and URI.host= should take unescaped as argument and escape them to
produce a valid URI. This interface could then be easily extended to
work with IRI (International domains).

maybe the solution is to rename current members host and host= to
host_rfc3986 and host_rfc3986=. Similar syntax already exists in
other classes like Time.

It introduces an incompatibility.
Also, the primitive method to get/set the host part will be version dependent.

I think adding a new method for deal brackets is better.

doesnt that mean you will need to fix all of the users of the function?
with my net/http example above you can somewhat control it.

but e.g. ruby -r open-uri 'p open("http://[::1]:8080/test/test")'
doesnt give me control over which function is used.

i think the better approach might be to let url.host return "::1".
and just encode it when constructing a new url.

another option might be monkeypatching the string object returned by
url.host and add a "dnssafe" method like shown in the msdn article.

then the resolv class could get the dnssafe version when needed. but it
still might lead to trouble when the hostname is passed to other C
extensions.

 darix

--
openSUSE - SUSE Linux is my linux
openSUSE is good for you
www.opensuse.org

=end

Actions #11

Updated by akr (Akira Tanaka) over 13 years ago

=begin
2010/9/10 Marcus Rueckert :

doesnt that mean you will need to fix all of the users of the function?
with my net/http example above you can somewhat control it.

All occurrences of URI::Generic#host should be examined.

I guess there may be usages which needs the brackets.
At least, one of the use in URI::Generic#find_proxy (defined in open-uri.rb)
is not clear which is suitable between the brackets are removed or not.

% svn diff --diff-cmd diff -x '-u -p'
Index: lib/open-uri.rb

--- lib/open-uri.rb (revision 29207)
+++ lib/open-uri.rb (working copy)
@@ -263,17 +263,17 @@ module OpenURI
# HTTP or HTTPS
if proxy
if proxy_user && proxy_pass

  •      klass = Net::HTTP::Proxy(proxy_uri.host, proxy_uri.port,
    

proxy_user, proxy_pass)

  •      klass = Net::HTTP::Proxy(proxy_uri.hostname,
    

proxy_uri.port, proxy_user, proxy_pass)
else

  •      klass = Net::HTTP::Proxy(proxy_uri.host, proxy_uri.port)
    
  •      klass = Net::HTTP::Proxy(proxy_uri.hostname, proxy_uri.port)
       end
     end
    
  •  target_host = target.host
    
  •  target_host = target.hostname
     target_port = target.port
     request_uri = target.request_uri
    
    else
    # FTP over HTTP proxy
  •  target_host = proxy_uri.host
    
  •  target_host = proxy_uri.hostname
     target_port = proxy_uri.port
     request_uri = target.to_s
     if proxy_user && proxy_pass
    

@@ -736,10 +736,10 @@ module URI
proxy_uri = ENV[name] || ENV[name.upcase]
end

  •  if proxy_uri && self.host
    
  •  if proxy_uri && self.hostname
       require 'socket'
       begin
    
  •      addr = IPSocket.getaddress(self.host)
    
  •      addr = IPSocket.getaddress(self.hostname)
         proxy_uri = nil if /\A127\.|\A::1\z/ =~ addr
       rescue SocketError
       end
    

@@ -804,7 +804,7 @@ module URI

    # The access sequence is defined by RFC 1738
    ftp = Net::FTP.new
  •  ftp.connect(self.host, self.port)
    
  •  ftp.connect(self.hostname, self.port)
     ftp.passive = true if !options[:ftp_active_mode]
     # todo: extract user/passwd from .netrc.
     user = 'anonymous'
    

Index: lib/net/http.rb

--- lib/net/http.rb (revision 29207)
+++ lib/net/http.rb (working copy)
@@ -64,7 +64,7 @@ module Net #:nodoc:
# require 'uri'
#
# url = URI.parse('http://www.example.com/index.html')

  • res = Net::HTTP.start(url.host, url.port) {|http|

  • res = Net::HTTP.start(url.hostname, url.port) {|http|

    http.get('/index.html')

    }

    puts res.body

@@ -75,7 +75,7 @@ module Net #:nodoc:
#
# url = URI.parse('http://www.example.com/index.html')
# req = Net::HTTP::Get.new(url.path)

  • res = Net::HTTP.start(url.host, url.port) {|http|

  • res = Net::HTTP.start(url.hostname, url.port) {|http|

    http.request(req)

    }

    puts res.body

@@ -101,7 +101,7 @@ module Net #:nodoc:
# req = Net::HTTP::Post.new(url.path)
# req.basic_auth 'jack', 'pass'
# req.set_form_data({'from' => '2005-01-01', 'to' => '2005-03-31'}, ';')

  • res = Net::HTTP.new(url.host, url.port).start {|http|

http.request(req) }

  • res = Net::HTTP.new(url.hostname, url.port).start {|http|

http.request(req) }
# case res
# when Net::HTTPSuccess, Net::HTTPRedirection
# # OK
@@ -390,7 +390,7 @@ module Net #:nodoc:
}
else
uri = uri_or_host

  •    new(uri.host, uri.port).start {|http|
    
  •    new(uri.hostname, uri.port).start {|http|
         return http.request_get(uri.request_uri, &block)
       }
     end
    

@@ -415,7 +415,7 @@ module Net #:nodoc:
req = Post.new(url.path)
req.form_data = params
req.basic_auth url.user, url.password if url.user

  •  new(url.host, url.port).start {|http|
    
  •  new(url.hostname, url.port).start {|http|
       http.request(req)
     }
    
    end
    --
    Tanaka Akira

=end

Actions #12

Updated by naruse (Yui NARUSE) over 13 years ago

=begin
(2010/09/10 6:44), Tanaka Akira wrote:

2010/9/8 Adam Majer:

Issue #3788 has been updated by Adam Majer.

In my opinion, adding dnssafehost to URI is bad design.

I think adding a method is a good design.

I agree with akr's patch; but URI#host should have a description that
it doesn't remove brackets.

--
NARUSE, Yui

=end

Actions #13

Updated by akr (Akira Tanaka) over 13 years ago

=begin
2010/9/12 NARUSE, Yui :

I agree with akr's patch; but URI#host should have a description that
it doesn't remove brackets.

I see.

Index: lib/uri/generic.rb

--- lib/uri/generic.rb (revision 29207)
+++ lib/uri/generic.rb (working copy)
@@ -205,8 +205,31 @@
self.set_path('') if @path && @opaque # (see RFC2396 Section 5.2)
self.set_port(self.default_port) if self.default_port && @port
end
+
attr_reader :scheme
+

TCPSocket.open.

  • If unwrapped host names are required, use "hostname" method.

  • URI("http://[::1]/bar/baz").host #=> "[::1]"

  • URI("http://[::1]/bar/baz").hostname #=> "::1"

  • attr_reader :host

  • attr_reader :port
    attr_reader :registry
    attr_reader :path
    @@ -412,6 +435,38 @@
    v
    end

  • extract the host part of the URI and unwrap brackets for IPv6 addresses.

  • This method is same as URI::Generic#host except

  • brackets for IPv6 (andn future IP) addresses are removed.

  • u = URI("http://[::1]/bar")

  • p u.hostname #=> "::1"

  • p u.host #=> "[::1]"

  • def hostname

  •  v = self.host
    
  •  /\A\[(.*)\]\z/ =~ v ? $1 : v
    
  • end

  • set the host part of the URI as the argument with brackets for

IPv6 addresses.

  • This method is same as URI::Generic#host= except

  • the argument can be bare IPv6 address.

  • u = URI("http://foo/bar")

  • p u.to_s #=> "http://foo/bar"

  • u.hostname = "::1"

  • p u.to_s #=> "http://[::1]/bar"

  • If the arugument seems IPv6 address,

  • it is wrapped by brackets.

  • def hostname=(v)
  •  v = "[#{v}]" if /\A\[.*\]\z/ !~ v && /:/ =~ v
    
  •  self.host = v
    
  • end
  • def check_port(v)
    return v unless v

--
Tanaka Akira

=end

Actions #14

Updated by shyouhei (Shyouhei Urabe) over 13 years ago

=begin
Hi, why is this issue still opened?
=end

Actions #15

Updated by naruse (Yui NARUSE) over 13 years ago

=begin
I fully agree with akr's last patch, please commit it.
=end

Actions #16

Updated by akr (Akira Tanaka) over 13 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r29416.
Adam, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0