If you try opening a file using a CP850 (possibly others) path which was passed as command line argument, you are not able at all, unless you encode the argument into its very own reported encoding (CP850), and from some encoding different than that (in my case, both ISO-8859-1 and Windows-1252 worked). It is just like ARGV[0].encoding is lying!
Before, in Ruby 1.8, File.open would work just fine. I have a script that just stopped working, till I found the above workaround. This seems to me like a bug. I would expect Ruby to just do its best in order to convert user input into the required encodings for file APIs and such. Meaning I would not like for a possible fix to require any code migration from 1.8 to 1.9+ at all.
If you type "chcp 850" in cmd.exe before calling the script, it should accept the argument. You can use the word "Japonês" (Japanese) as example for the file path.
I would expect that if ARGV[0].encoding is CP850, then the string is encoded as CP850. Instead, the string is encoded in another encoding, ISO-8859-1. The reduced test case should output this:
Encoding of argument is reported as CP850 and as valid.
Let us inspect the a-tilde argument: "\xE3"
Let us inspect the a-tilde from UTF-8 source code transcoded into CP850: "\xC6"
Let us inspect the a-tilde from UTF-8 source code transcoded into ISO-8859-1: "\xE3"
RESULT: as you can see, the argument looks like an ISO-8859-1 string, but reports its encoding as CP850.
puts"Encoding of argument is reported as #{ARGV[0].encoding} and as #{ARGV[0].valid_encoding??"valid":"invalid"}."puts"Let us inspect the a-tilde argument: #{ARGV[0].dump}"puts"Let us inspect the a-tilde from UTF-8 source code transcoded into CP850: #{"ã".encode("CP850").dump}"puts"Let us inspect the a-tilde from UTF-8 source code transcoded into ISO-8859-1: #{"ã".encode("ISO-8859-1").dump}"
output is:
ruby t.rb ã
Encoding of argument is reported as UTF-8 and as valid.
Let us inspect the a-tilde argument: "\u00E3"
Let us inspect the a-tilde from UTF-8 source code transcoded into CP850: "\xC6"
Let us inspect the a-tilde from UTF-8 source code transcoded into ISO-8859-1: "\xE3"