Project

General

Profile

Actions

Bug #5831

closed

URI.extract not properly extracting URIs with trailing slash followed by single quote

Added by bcardarella (Brian Cardarella) almost 13 years ago. Updated almost 13 years ago.

Status:
Rejected
Assignee:
-
Target version:
ruby -v:
1.9.2-p290
Backport:
[ruby-core:<unknown>]

Description

I have example failing test cases here:

https://gist.github.com/1547904

Here is my use case. I am looking to extract URIs from emails. It has been recommended to use Nokogiri and that is just fine if the email is in HTML. But if the email is in plain-text Nokogiri doesn't work. IMO this is a bug with URI.extract's regexp.

I have tested this against 1.8.7, 1.9.2, and 1.9.3 and it exists in all three.

Updated by xds2000 (deshi xiao) almost 13 years ago

I have reading lib/uri/common.rb, I found the URI.extract's behavior is split url with whitespace. so i think you report is not bug. here is clue,please have a look.

# Constructs the default Hash of Regexp's                                                                                         

500 def initialize_regexp(pattern)
501 ret = {}
502
503 # for URI::split
504 ret[:ABS_URI] = Regexp.new('\A\s*' + pattern[:X_ABS_URI] + '\s*\z', Regexp::EXTENDED)
505 ret[:REL_URI] = Regexp.new('\A\s*' + pattern[:X_REL_URI] + '\s*\z', Regexp::EXTENDED)

Updated by naruse (Yui NARUSE) almost 13 years ago

  • Status changed from Open to Rejected

Sorry for late reply.

As deshi says, that's not a bug, it's a feature.

Actions

Also available in: Atom PDF

Like0
Like0Like0