Feature #2542

URI lib should be updated to RFC 3986

Added by Marc-Andre Lafortune over 5 years ago. Updated about 1 year ago.

[ruby-core:27360]
Status:Closed
Priority:Normal
Assignee:Yui NARUSE

Description

=begin
RFC 2396 has been obsolete for nearly 5 years now.

It was replaced by RFC 3986 which aims at clarifying aspects that were not previously clear.
=end


Related issues

Related to Ruby trunk - Bug #4110: ホスト名の先頭が数字であるとき、WEBrickのテストでErrorが出る Closed 12/02/2010
Related to Ruby trunk - Bug #4673: URI::Generic registry is not properly set. Feedback 05/12/2011
Related to Ruby trunk - Bug #8352: uri squeezes a sequence of slashes in merging paths when it shouldn't Open 05/02/2013
Related to Ruby trunk - Bug #10402: URI regression in 2.2.0-preview1 (bad URI(is not URI?): URI::InvalidURIError) Closed
Related to Ruby trunk - Bug #9990: URI.parse and URI.encode use different RFCs Assigned 06/28/2014

Associated revisions

Revision 46491
Added by Yui NARUSE about 1 year ago

support RFC3986 [Feature #2542]

  • lib/uri/common.rb (URI::REGEXP): move to lib/uri/rfc2396_parser.rb.

  • lib/uri/common.rb (URI::Parser): ditto.

  • lib/uri/common.rb (URI.split): use RFC3986_Parser.

  • lib/uri/common.rb (URI.parse): ditto.

  • lib/uri/common.rb (URI.join): ditto.

  • lib/uri/common.rb (URI.extract): deprecated.

  • lib/uri/common.rb (URI.regexp): ditto.

  • lib/uri/rfc2396_parser.rb: added.

  • lib/uri/rfc3986_parser.rb: added.

Revision 46491
Added by Yui NARUSE about 1 year ago

support RFC3986 [Feature #2542]

  • lib/uri/common.rb (URI::REGEXP): move to lib/uri/rfc2396_parser.rb.

  • lib/uri/common.rb (URI::Parser): ditto.

  • lib/uri/common.rb (URI.split): use RFC3986_Parser.

  • lib/uri/common.rb (URI.parse): ditto.

  • lib/uri/common.rb (URI.join): ditto.

  • lib/uri/common.rb (URI.extract): deprecated.

  • lib/uri/common.rb (URI.regexp): ditto.

  • lib/uri/rfc2396_parser.rb: added.

  • lib/uri/rfc3986_parser.rb: added.

Revision 46680
Added by Yui NARUSE about 1 year ago

  • lib/uri/generic.rb (URI::Generic#query=): remove validation, just
    escape. [Feature #2542]

  • lib/uri/generic.rb (URI::Generic#fragment=): ditto.

  • lib/uri/generic.rb (URI::Generic#check_query): removed.

  • lib/uri/generic.rb (URI::Generic#set_query): ditto.

  • lib/uri/generic.rb (URI::Generic#check_fragment): ditto.

  • lib/uri/generic.rb (URI::Generic#set_fragment): ditto.

Revision 46680
Added by Yui NARUSE about 1 year ago

  • lib/uri/generic.rb (URI::Generic#query=): remove validation, just
    escape. [Feature #2542]

  • lib/uri/generic.rb (URI::Generic#fragment=): ditto.

  • lib/uri/generic.rb (URI::Generic#check_query): removed.

  • lib/uri/generic.rb (URI::Generic#set_query): ditto.

  • lib/uri/generic.rb (URI::Generic#check_fragment): ditto.

  • lib/uri/generic.rb (URI::Generic#set_fragment): ditto.

History

#1 Updated by Marc-Andre Lafortune over 5 years ago

  • Subject changed from URI lib should be updated to RFC 39886 to URI lib should be updated to RFC 3986

#2 Updated by Yui NARUSE over 5 years ago

FYI, RFC 3986 will be obsoleted.
http://tools.ietf.org/html/draft-duerst-iri-bis-07

#3 Updated by Martin Dürst over 5 years ago

No, RFC 3986 (URI) will NOT be updated. RFC 3987 (IRI), in due time,
will be updated. See also
http://www.ietf.org/ibin/c5i?mid=6&rid=49&gid=0&k1=934&k2=7294&tid=1262671757.

Regards, Martin.


#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

#4 Updated by Yui NARUSE over 5 years ago

Ah, Martin is right, it's RFC 3987 and RFC 3986 will be still alive.

Anyway Bob Aman introduces an alternative library named Addressable.
http://addressable.rubyforge.org/
It looks good but some incompatibilities for current URI library.
I'll suggest to bundle Addressable and obsolete current URI lib,
but I have to plan its migration path.

#5 Updated by Yusuke Endoh over 5 years ago

  • Target version changed from 1.9.2 to 2.0.0

Hi,

I'll suggest to bundle Addressable and obsolete current URI lib,
but I have to plan its migration path.

This ticket seems to need much work.
I guess we can't make the deadline of spec freezing.
So I change the target to 1.9.x.
If you want 1.9.2 to include the feature, please discuss right now.


Yusuke Endoh mame@tsg.ne.jp

#6 Updated by Marc-Andre Lafortune over 5 years ago

  • Target version changed from 2.0.0 to 1.9.2

I feel the spec for 1.9.2 has been quite clear for 5 years ... follow RFC 3986!

Integrating some of the features of the addressable gem can be discussed later.

Do we have to wait for Akira Yamada, the official maintainer of this library?

#7 Updated by Yui NARUSE over 5 years ago

I object to target 1.9.2.
Following RFC 3986 makes some incompatibilities.
It shouldn't be done without consideration.

#8 Updated by Martin Dürst over 5 years ago

Hello Yui,

Is there a list of incompatibilities, or can you make one?

Regards, Martin.


#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp

#9 Updated by Yui NARUSE over 5 years ago

2010/3/25 "Martin J. Dürst" duerst@it.aoyama.ac.jp:

Is there a list of incompatibilities, or can you make one?

Some structures of the syntax is changed in RFC 3986.
This breaks URI::REGEXP::PATTERN::TOPLABEL and some constants.


NARUSE, Yui
naruse@airemix.jp

#10 Updated by Kazuhiro NISHIYAMA about 5 years ago

  • Target version changed from 1.9.2 to 2.0.0
  • Status changed from Open to Assigned

#11 Updated by Hedge Hog about 5 years ago

Rather than reinvent anything. Consider employing an FFI interface to uriparser:

http://uriparser.sourceforge.net/

not sure if there is a port for windows or if an equivalent windows lib is available?

#12 Updated by Luis Lavena about 5 years ago

On Wed, May 12, 2010 at 8:01 PM, Hedge Hog redmine@ruby-lang.org wrote:

Issue #2542 has been updated by Hedge Hog.

Rather than reinvent anything.  Consider employing an FFI interface to uriparser:

http://uriparser.sourceforge.net/

not sure if there is a port for windows or if an equivalent windows lib is available?

we will not only depend on uriparser C library but also will require
libcpptest to be able to configure and compile uriparser.


Luis Lavena
AREA 17


Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

#13 Updated by Philippe Lucas almost 5 years ago

It depends of libcpptest only for test so you can build the package with '--disable-test'.

#14 Updated by Yui NARUSE over 4 years ago

  • Assignee changed from akira yamada to Yui NARUSE

I come to think uri lib should move to RFC 3986 even if it breaks some compatibility.
But I don't want that new implementation/spec will be also a white box like now.
So I think:

  • keep current URI::REGEXP, URI::Parser and so on.
  • at least URI.parse doesn't use URI::Parser but use new implementation.

How about this?

#15 Updated by Yui NARUSE almost 4 years ago

  • Target version changed from 2.0.0 to 1.9.4

#16 Updated by Nikos Dimitrakopoulos over 2 years ago

Are there any plans for actually fixing this? Not sure I can help, and no troll appetite - just asking :)

#17 Updated by Yusuke Endoh over 2 years ago

  • Target version changed from 1.9.4 to next minor

Naruse-san, could you please answer to Nikos?

I'm setting to next minor, but if you are willing to do anything to 2.0.0, and if the impact is so small, I may accept.

Yusuke Endoh mame@tsg.ne.jp

#18 Updated by Yui NARUSE about 2 years ago

Just an experimental implementation:
http://github.com/nurse/url

#19 Updated by Yui NARUSE about 1 year ago

  • % Done changed from 0 to 100
  • Status changed from Assigned to Closed

Applied in changeset r46491.


support RFC3986 [Feature #2542]

  • lib/uri/common.rb (URI::REGEXP): move to lib/uri/rfc2396_parser.rb.

  • lib/uri/common.rb (URI::Parser): ditto.

  • lib/uri/common.rb (URI.split): use RFC3986_Parser.

  • lib/uri/common.rb (URI.parse): ditto.

  • lib/uri/common.rb (URI.join): ditto.

  • lib/uri/common.rb (URI.extract): deprecated.

  • lib/uri/common.rb (URI.regexp): ditto.

  • lib/uri/rfc2396_parser.rb: added.

  • lib/uri/rfc3986_parser.rb: added.

#20 Updated by Aaron Patterson about 1 year ago

r46491 broke this script:

require 'uri'

thing = URI.parse 'http://example.com'
thing.query = 'location[]=1&location[]=2&age_group[]=2'

Before r46491 it would set the query, after r46491, it raises an exception.

Is this a bug in the new implementation? Or should I be doing something different? (I pulled this from the Rails tests, so I'm not 100% sure what it is actually for)

#21 Updated by Jeremy Kemper about 1 year ago

In RFC 3986, square brackets are no longer allowed in the query part.

Source of the unescaped brackets, in this case: https://github.com/brynary/rack-test/blob/master/lib/rack/test/utils.rb

This may become a common issue since plenty of code uses URI.parse and expects its more permissive RFC 2396 parsing.

#22 Updated by Zachary Scott about 1 year ago

I think #9990 is related /cc @naruse @JK @tenderlove

#23 Updated by Yui NARUSE about 1 year ago

I'm considering to change the error policy of URI library, for example:
BEFORE: raise error if invalid characters exist
AFTER: percent-escape them

#24 Updated by Leonard Garvey about 1 year ago

I've implemented something similar to that policy in the following gist: https://gist.github.com/lengarvey/31983eac6664351ed16d

It's a very basic naive implementation but I believe it roughly does what we need URI.parse to do.

#25 Updated by Nobuyoshi Nakada 9 months ago

  • Related to Bug #10402: URI regression in 2.2.0-preview1 (bad URI(is not URI?): URI::InvalidURIError) added

#26 Updated by Nobuyoshi Nakada 9 months ago

  • Related to Bug #9990: URI.parse and URI.encode use different RFCs added

Also available in: Atom PDF