Bug #7156
closedInvalid byte sequence in US-ASCII when using URI from std lib
Description
Invalid byte sequence in US-ASCII on ruby 1.9.3
I receive that error when trying to open url with bulgarian text (utf-8: "История"). It seems that the problem is in uri/common.rb from ruby standard library...
adding str.force_encoding(Encoding::BINARY) to following method fix the problem
class URI::Parser
def escape(str, unsafe = @regexp[:UNSAFE])
unless unsafe.kind_of?(Regexp)
# perhaps unsafe is String object
unsafe = Regexp.new("[#{Regexp.quote(unsafe)}]", false)
end
str.force_encoding(Encoding::BINARY) # FIX
str.gsub(unsafe) do
us = $&
tmp = ''
us.each_byte do |uc|
tmp << sprintf('%%%02X', uc)
end
tmp
end.force_encoding(Encoding::US_ASCII)
end
end
One more suggestion - maybe US_ASCII must be replaced to Encoding::BINARY too?
Files
Updated by meta (mathew murphy) about 12 years ago
What part of the URL contains the UTF-8 characters?
If it's the domain, you need to decode the UTF-8 into punycode before passing it to Ruby.
It it's in the path, Ruby ought to handle it for IRI compliance, but probably doesn't right now...
Updated by mame (Yusuke Endoh) about 12 years ago
- File bulgarian.rb bulgarian.rb added
- Status changed from Open to Feedback
- Target version set to 2.0.0
I'm not sure what you want. I cannot reproduce this issue by the following code.
$ cat bulgarian.rb
# coding: UTF-8
require "uri"
p URI.escape("История")
$ ruby bulgarian.rb
"%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F"
Could you please tell us a example code, expected result and actual one?
--
Yusuke Endoh mame@tsg.ne.jp
Updated by ko1 (Koichi Sasada) almost 12 years ago
- Target version changed from 2.0.0 to 2.6
No feedback.
Updated by ko1 (Koichi Sasada) almost 12 years ago
- Assignee set to naruse (Yui NARUSE)
Updated by naruse (Yui NARUSE) about 6 years ago
- Status changed from Feedback to Rejected
The argument of URI need to be escaped.
Maybe Ruby support non escaped URI when browser's URL handling becomes concrete.