Bug #5831

URI.extract not properly extracting URIs with trailing slash followed by single quote

Added by Brian Cardarella over 2 years ago. Updated about 2 years ago.

[ruby-core:<unknown>]
Status:Rejected
Priority:Normal
Assignee:-
Category:lib
Target version:1.9.2
ruby -v:1.9.2-p290 Backport:

Description

I have example failing test cases here:

https://gist.github.com/1547904

Here is my use case. I am looking to extract URIs from emails. It has been recommended to use Nokogiri and that is just fine if the email is in HTML. But if the email is in plain-text Nokogiri doesn't work. IMO this is a bug with URI.extract's regexp.

I have tested this against 1.8.7, 1.9.2, and 1.9.3 and it exists in all three.

History

#1 Updated by deshi xiao about 2 years ago

I have reading lib/uri/common.rb, I found the URI.extract's behavior is split url with whitespace. so i think you report is not bug. here is clue,please have a look.

# Constructs the default Hash of Regexp's                                                                                         

500 def initializeregexp(pattern)

501 ret = {}

502

503 # for URI::split

504 ret[:ABS
URI] = Regexp.new('\A\s' + pattern[:XABSURI] + '\s\z', Regexp::EXTENDED)
505 ret[:RELURI] = Regexp.new('\A\s*' + pattern[:XREL_URI] + '\s*\z', Regexp::EXTENDED)

#2 Updated by Yui NARUSE about 2 years ago

  • Status changed from Open to Rejected

Sorry for late reply.

As deshi says, that's not a bug, it's a feature.

Also available in: Atom PDF