Bug #5831

URI.extract not properly extracting URIs with trailing slash followed by single quote

Added by Brian Cardarella almost 4 years ago. Updated almost 4 years ago.

ruby -v:1.9.2-p290 Backport:


I have example failing test cases here:


Here is my use case. I am looking to extract URIs from emails. It has been recommended to use Nokogiri and that is just fine if the email is in HTML. But if the email is in plain-text Nokogiri doesn't work. IMO this is a bug with URI.extract's regexp.

I have tested this against 1.8.7, 1.9.2, and 1.9.3 and it exists in all three.


#1 Updated by deshi xiao almost 4 years ago

I have reading lib/uri/common.rb, I found the URI.extract's behavior is split url with whitespace. so i think you report is not bug. here is clue,please have a look.

# Constructs the default Hash of Regexp's                                                                                         

500 def initialize_regexp(pattern)

501 ret = {}


503 # for URI::split

504 ret[:ABS_URI] = Regexp.new('\A\s*' + pattern[:X_ABS_URI] + '\s*\z', Regexp::EXTENDED)
505 ret[:REL_URI] = Regexp.new('\A\s*' + pattern[:X_REL_URI] + '\s*\z', Regexp::EXTENDED)

#2 Updated by Yui NARUSE almost 4 years ago

  • Status changed from Open to Rejected

Sorry for late reply.

As deshi says, that's not a bug, it's a feature.

Also available in: Atom PDF