Bug #7156

Invalid byte sequence in US-ASCII when using URI from std lib

Added by Todor Dragnev about 3 years ago. Updated almost 3 years ago.

Assignee:Yui NARUSE
ruby -v:1.9.3 Backport:


Invalid byte sequence in US-ASCII on ruby 1.9.3

I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library...

adding str.force_encoding(Encoding::BINARY) to following method fix the problem

class URI::Parser
def escape(str, unsafe = @regexp[:UNSAFE])
unless unsafe.kind_of?(Regexp)
# perhaps unsafe is String object
unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false)
str.force_encoding(Encoding::BINARY) # FIX
str.gsub(unsafe) do
us = $&
tmp = ''
us.each_byte do |uc|
tmp << sprintf('%%%02X', uc)

One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?

bulgarian.rb Magnifier (61 Bytes) Yusuke Endoh, 11/06/2012 08:43 PM


#1 Updated by mathew murphy about 3 years ago

What part of the URL contains the UTF-8 characters?

If it's the domain, you need to decode the UTF-8 into punycode before passing it to Ruby.

It it's in the path, Ruby ought to handle it for IRI compliance, but probably doesn't right now...


#2 Updated by Yusuke Endoh about 3 years ago

  • File bulgarian.rbMagnifier added
  • Status changed from Open to Feedback
  • Target version set to 2.0.0

I'm not sure what you want. I cannot reproduce this issue by the following code.

$ cat bulgarian.rb
# coding: UTF-8
require "uri"
p URI.escape("История")

$ ruby bulgarian.rb

Could you please tell us a example code, expected result and actual one?

Yusuke Endoh mame@tsg.ne.jp

#3 Updated by Koichi Sasada almost 3 years ago

  • Target version changed from 2.0.0 to next minor

No feedback.

#4 Updated by Koichi Sasada almost 3 years ago

  • Assignee set to Yui NARUSE

Also available in: Atom PDF