Bug #5831
closedURI.extract not properly extracting URIs with trailing slash followed by single quote
Description
I have example failing test cases here:
https://gist.github.com/1547904
Here is my use case. I am looking to extract URIs from emails. It has been recommended to use Nokogiri and that is just fine if the email is in HTML. But if the email is in plain-text Nokogiri doesn't work. IMO this is a bug with URI.extract's regexp.
I have tested this against 1.8.7, 1.9.2, and 1.9.3 and it exists in all three.
Updated by xds2000 (deshi xiao) almost 13 years ago
I have reading lib/uri/common.rb, I found the URI.extract's behavior is split url with whitespace. so i think you report is not bug. here is clue,please have a look.
# Constructs the default Hash of Regexp's
500 def initialize_regexp(pattern)
501 ret = {}
502
503 # for URI::split
504 ret[:ABS_URI] = Regexp.new('\A\s*' + pattern[:X_ABS_URI] + '\s*\z', Regexp::EXTENDED)
505 ret[:REL_URI] = Regexp.new('\A\s*' + pattern[:X_REL_URI] + '\s*\z', Regexp::EXTENDED)
Updated by naruse (Yui NARUSE) almost 13 years ago
- Status changed from Open to Rejected
Sorry for late reply.
As deshi says, that's not a bug, it's a feature.