Feature #4270

Resolv does not handle UTF8 domain names.

Added by Hal Brodigan over 3 years ago. Updated about 1 year ago.

[ruby-core:34394]
Status:Closed
Priority:Normal
Assignee:Akira Tanaka
Category:lib
Target version:next minor

Description

=begin
Resolv.getaddress(es) cannot handle UTF8 domain names:

Steps to reproduce error:

 Resolv.getaddress('∞.com')

Expected result:

 174.132.17.93

Actual result:

 Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:757:in `[]='
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:757:in `sender'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:504:in `block in each_resource'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:1000:in `block (3 levels) in resolv'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:998:in `each'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:998:in `block (2 levels) in resolv'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:997:in `each'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:997:in `block in resolv'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:995:in `each'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:995:in `resolv'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:498:in `each_resource'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:391:in `each_address'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:115:in `block in each_address'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:114:in `each'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:114:in `each_address'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:92:in `getaddress'
from /home/hal/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/resolv.rb:43:in `getaddress'

=end

History

#1 Updated by Shota Fukumori over 3 years ago

  • Status changed from Open to Rejected

=begin
UTF-8 domain names are punycode, so you should encode utf-8 domain name to punycode domain name.

Like

Resolv.getaddress('xn--59g.com')
=end

#2 Updated by Hal Brodigan over 3 years ago

=begin
Charles Nutter (@headius) suggested a way for the code to fail less loudly. https://gist.github.com/775696

If Resolv is going to explicitly not support UTF8 domain names, it should raise a descriptive ArgumentError.
=end

#3 Updated by Shota Fukumori over 3 years ago

=begin
Please request again as feature request.
=end

#4 Updated by Hiroshi Nakamura over 3 years ago

  • Status changed from Rejected to Open

=begin
Reopening since I moved this to 'Feature'. Isn't it enough?
=end

#5 Updated by Shota Fukumori over 3 years ago

=begin
I forgot moving feature in Redmine..
=end

#6 Updated by Usaku NAKAMURA over 3 years ago

  • Category set to lib
  • Status changed from Open to Assigned
  • Assignee set to Akira Tanaka

=begin

=end

#7 Updated by Yusuke Endoh over 1 year ago

  • Description updated (diff)
  • Target version set to next minor

#8 Updated by Barry Allard about 1 year ago

We've been using a monkey patch based on gnu libidn's functions for rfcs 3490, 3491 & 3492.

Here's an extract of the critical functions (toASCII and toUNICODE), please feel free to hack/fork/comment/etc: https://gist.github.com/5328637 (Unit tests included).

=> Resolv::Unicode.to_ascii('一流大學.中国')
"xn--4gqt5y3xbky5a.xn--fiqs8s"
=>

PING xn--4gqt5y3xbky5a.xn--fiqs8s (158.125.1.208): 56 data bytes
64 bytes from 158.125.1.208: icmpseq=0 ttl=46 time=168.616 ms
64 bytes from 158.125.1.208: icmp
seq=1 ttl=46 time=163.608 ms

#9 Updated by Akira Tanaka about 1 year ago

It is not appropriate to use external library from bundled library such as resolv.rb.

#10 Updated by Barry Allard about 1 year ago

That was a rough suggestion that works right now, it's definitely not perfect. It makes sense for someone to create an autotools patch that detects libidn, setup lib and include paths and refactor ruby glue code to eliminate idn-ruby dependency (not the older idn gem). The code for libidn is very complicated (feel free to read the RFC's if you like), better to link against it rather than refactor to Ruby because it's a known quantity.

I think first it would need a discussion to decide how/when to perform unicode conversions with minimal breakage, be DRY and predictable.

#11 Updated by Barry Allard about 1 year ago

For now, I've rolled up some code into a gem: resolv-idn

#12 Updated by Akira Tanaka about 1 year ago

  • Status changed from Assigned to Closed

It seems this feature is provided by a gem.

So I close this issue now.

Also available in: Atom PDF