Bug #7156: Invalid byte sequence in US-ASCII when using URI from std lib - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #7156

closed

Invalid byte sequence in US-ASCII when using URI from std lib

Added by t0d0r (Todor Dragnev) over 12 years ago. Updated over 6 years ago.

Status:

Rejected

Assignee:

naruse (Yui NARUSE)

Target version:

ruby -v:

1.9.3

Backport:

2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN

[ruby-core:47966]

Description

Invalid byte sequence in US-ASCII on ruby 1.9.3

I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library...

adding str.force_encoding(Encoding::BINARY) to following method fix the problem

class URI::Parser
def escape(str, unsafe = @regexp[:UNSAFE])
unless unsafe.kind_of?(Regexp)
# perhaps unsafe is String object
unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false)
end
str.force_encoding(Encoding::BINARY) # FIX
str.gsub(unsafe) do
us = $&
tmp = ''
us.each_byte do |uc|
tmp << sprintf('%%%02X', uc)
end
tmp
end.force_encoding(Encoding::US_ASCII)
end
end

One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?

Files

bulgarian.rb (61 Bytes) bulgarian.rb

mame (Yusuke Endoh), 11/06/2012 08:43 PM

Actions

Copy link

#1 [ruby-core:48011]

Updated by meta (mathew murphy) over 12 years ago

What part of the URL contains the UTF-8 characters?

If it's the domain, you need to decode the UTF-8 into punycode before passing it to Ruby.

It it's in the path, Ruby ought to handle it for IRI compliance, but probably doesn't right now...

http://www.w3.org/International/articles/idn-and-iri/

Actions

Copy link

#2 [ruby-core:48972]

Updated by mame (Yusuke Endoh) over 12 years ago

File bulgarian.rb bulgarian.rb added
Status changed from Open to Feedback
Target version set to 2.0.0

I'm not sure what you want. I cannot reproduce this issue by the following code.

$ cat bulgarian.rb
# coding: UTF-8
require "uri"
p URI.escape("История")

$ ruby bulgarian.rb
"%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F"

Could you please tell us a example code, expected result and actual one?

--
Yusuke Endoh mame@tsg.ne.jp

Actions

Copy link

#3 [ruby-core:52322]

Updated by ko1 (Koichi Sasada) over 12 years ago

Target version changed from 2.0.0 to 2.6

No feedback.

Actions

Copy link

#4 [ruby-core:52414]

Updated by ko1 (Koichi Sasada) over 12 years ago

Assignee set to naruse (Yui NARUSE)

Actions

Copy link

Updated by naruse (Yui NARUSE) over 7 years ago

Target version deleted (~~2.6~~)

Actions

Copy link

#6 [ruby-core:89492]

Updated by naruse (Yui NARUSE) over 6 years ago

Status changed from Feedback to Rejected

The argument of URI need to be escaped.
Maybe Ruby support non escaped URI when browser's URL handling becomes concrete.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #7156

Invalid byte sequence in US-ASCII when using URI from std lib

Updated by meta (mathew murphy) over 12 years ago

Updated by mame (Yusuke Endoh) over 12 years ago

Updated by ko1 (Koichi Sasada) over 12 years ago

Updated by ko1 (Koichi Sasada) over 12 years ago

Updated by naruse (Yui NARUSE) over 7 years ago

Updated by naruse (Yui NARUSE) over 6 years ago