Bug #11522: URI::decode returns incorrectly encoding strings - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #11522

closed

URI::decode returns incorrectly encoding strings

Bug #11522: URI::decode returns incorrectly encoding strings

Added by charlieda (Charlie Anderson) over 10 years ago. Updated over 10 years ago.

Status:

Rejected

Assignee:

akira (akira yamada)

Target version:

ruby -v:

ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]

Backport:

2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN

[ruby-core:<unknown>]

Description

When given unicode characters to encode and decode, the URI module returns a string with an invalid encoding.

irb(main):026:0* unicode = 'œ´å∑®´ß∂†≈©ƒç˙©√∆˙∫˚∆~¬'
=> "œ´å∑®´ß∂†≈©ƒç˙©√∆˙∫˚∆~¬"
irb(main):027:0> unicode.encoding
=> #<Encoding:UTF-8>
irb(main):028:0> unicode.valid_encoding?
=> true
irb(main):029:0> encoded = URI::encode(unicode)
=> "%C5%93%C2%B4%C3%A5%E2%88%91%C2%AE%C2%B4%C3%9F%E2%88%82%E2%80%A0%E2%89%88%C2%A9%C6%92%C3%A7%CB%99%C2%A9%E2%88%9A%E2%88%86%CB%99%E2%88%AB%CB%9A%E2%88%86~%C2%AC"
irb(main):030:0> encoded.encoding
=> #<Encoding:US-ASCII>
irb(main):031:0> encoded.valid_encoding?
=> true
irb(main):032:0> decoded = URI::decode(encoded)
=> "\xC5\x93\xC2\xB4\xC3\xA5\xE2\x88\x91\xC2\xAE\xC2\xB4\xC3\x9F\xE2\x88\x82\xE2\x80\xA0\xE2\x89\x88\xC2\xA9\xC6\x92\xC3\xA7\xCB\x99\xC2\xA9\xE2\x88\x9A\xE2\x88\x86\xCB\x99\xE2\x88\xAB\xCB\x9A\xE2\x88\x86~\xC2\xAC"
irb(main):033:0> decoded.encoding
=> #<Encoding:US-ASCII>
irb(main):034:0> decoded.valid_encoding?
=> false

I would expect decoded to have a valid encoding - probably as UTF-8?

Updated by charlieda (Charlie Anderson) over 10 years ago Actions
Copy link
#1

Assignee set to akira (akira yamada)

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#2

It has no hints for encoding.

Updated by usa (Usaku NAKAMURA) over 10 years ago Actions
Copy link
#3

I agree with you, nobu.
But, it should be ASCII-8BIT, not US-ASCII.

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#4

Status changed from Open to Rejected

Firstly, URI.unescape is obsolete.
CGI.unescape, which sets the encoding to @@accept_charset, may work for you.

Updated by duerst (Martin Dürst) over 10 years ago Actions
Copy link
#5

Nobuyoshi Nakada wrote:

It has no hints for encoding.

In theory, that's correct. In practice, there are several better possibilities.

We can add an additional parameter that indicates the encoding.
We can default to UTF-8. That's because most URIs that contain non-ASCII byte values these days are based on UTF-8, and their percentage is increasing steadily.
We can check whether using UTF-8 makes sense or not. If the bytes are valid UTF-8, then the chance that they are anything else than UTF-8 is virtually 0.
and 2) are already done by CGI.unescape. But 3) isn't. Also, CGI.unescape changes '+' to ' ', which is desirable in some contexts (query parts in http(s) URIs), but not in others (e.g. mailto URIs).

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Bug #11522

URI::decode returns incorrectly encoding strings

Updated by charlieda (Charlie Anderson) over 10 years ago Actions
Copy link
#1

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#2

Updated by usa (Usaku NAKAMURA) over 10 years ago Actions
Copy link
#3

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#4

Updated by duerst (Martin Dürst) over 10 years ago Actions
Copy link
#5

Project

General

Profile

Ruby

Custom queries

Bug #11522

URI::decode returns incorrectly encoding strings

Updated by charlieda (Charlie Anderson) over 10 years ago ActionsCopy link #1

Updated by nobu (Nobuyoshi Nakada) over 10 years ago ActionsCopy link #2

Updated by usa (Usaku NAKAMURA) over 10 years ago ActionsCopy link #3

Updated by nobu (Nobuyoshi Nakada) over 10 years ago ActionsCopy link #4

Updated by duerst (Martin Dürst) over 10 years ago ActionsCopy link #5

Updated by charlieda (Charlie Anderson) over 10 years ago Actions
Copy link
#1

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#2

Updated by usa (Usaku NAKAMURA) over 10 years ago Actions
Copy link
#3

Updated by nobu (Nobuyoshi Nakada) over 10 years ago Actions
Copy link
#4

Updated by duerst (Martin Dürst) over 10 years ago Actions
Copy link
#5