Project

General

Profile

Bug #5684

[[Ruby 1.9:]] Socket doesn't respect default external encoding

Added by vovik (Vladimir Chernis) over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Target version:
-
ruby -v:
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.2.0]
Backport:
[ruby-core:41385]

Description

When receiving data from a TCPSocket (as in client.rb, attached), the default internal encoding specified by the -E option to ruby is not respected.

Steps:
(1) In terminal window A, run: ruby server.rb
(2) In terminal window B, run: ruby -E ISO-8859-1 client.rb

Expected result for terminal window B:
bytes: "hell\xF6"
encoding: ISO-8859-1

Actual result for terminal window B:
bytes: "hell\xF6"
encoding: ASCII-8BIT

Workaround:
Use String#force_encoding('ISO-8859-1')


Files

client.rb (226 Bytes) client.rb ruby -E ISO-8859-1 client.rb vovik (Vladimir Chernis), 11/29/2011 11:35 AM
server.rb (440 Bytes) server.rb ruby server.rb vovik (Vladimir Chernis), 11/29/2011 11:35 AM
client.rb (221 Bytes) client.rb recv -> read vovik (Vladimir Chernis), 12/02/2011 10:17 AM
socket_vs_file.rb (188 Bytes) socket_vs_file.rb ruby -E ISO-8859-1 socket_vs_file.rb vovik (Vladimir Chernis), 12/02/2011 10:36 AM

History

Updated by naruse (Yui NARUSE) over 7 years ago

You can set encodings to a Socket object with Socket#set_encoding.
But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.

Updated by vovik (Vladimir Chernis) over 7 years ago

Yui NARUSE wrote:

You can set encodings to a Socket object with Socket#set_encoding.

I understand, but if I don't call Socket#set_encoding, shouldn't the encoding fall back to the default encoding specified by the -E option to ruby?

But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.

Is IO#read the same as Socket#read? Because changing recv to read in client.rb doesn't change anything about the encoding.

I know File#read respects the default encoding. It would be nice if Socket#read did the same thing, especially since Net::HTTP uses Socket.

Updated by vovik (Vladimir Chernis) over 7 years ago

To summarize:
File IO encoding works correctly in that it respects the default external encoding specified in the -E option to ruby. But Socket encoding does not.

I've attached a simple test case to illustrate the problem. When I run it with ruby -E ISO-8859-1 socket_vs_file.rb, I expect the following output:

file encoding: ISO-8859-1
socket encoding: ISO-8859-1

But instead, I get this output:

file encoding: ISO-8859-1
socket encoding: ASCII-8BIT

Am I mistaken to expect this behavior?

#4

Updated by naruse (Yui NARUSE) over 7 years ago

Vladimir Chernis wrote:

Yui NARUSE wrote:

You can set encodings to a Socket object with Socket#set_encoding.

I understand, but if I don't call Socket#set_encoding, shouldn't the encoding fall back to the default encoding specified by the -E option to ruby?

Socket doesn't respect default_external because default_external is set from the locale of the client system,
but the encoding of the input string from sockets is depend on the server software.
Moreover data from socket is usually binary.

But Socket#recv is an binary API like IO#read(n)
You can use textual API IO#read and get ISO-8859-1 string.

Is IO#read the same as Socket#read? Because changing recv to read in client.rb doesn't change anything about the encoding.

I know File#read respects the default encoding. It would be nice if Socket#read did the same thing, especially since Net::HTTP uses Socket.

File and Socket are different.
Note that Net::HTTP's policy is independent from Socket.

Am I mistaken to expect this behavior?

The conclusion is, Yes.

Updated by ko1 (Koichi Sasada) over 7 years ago

  • Assignee set to naruse (Yui NARUSE)

Updated by naruse (Yui NARUSE) over 7 years ago

  • Status changed from Open to Rejected

Also available in: Atom PDF