Bug #178
File.open on sprintf-formatted string fails with encoding conversion error on OS X
| Status: | Closed | Start date: | 06/18/2008 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 100% |
||
| Category: | - | |||
| Target version: | - | |||
| ruby -v: |
Description
String#% and File.open are interacting strangely on OS X, so files opened with a sprintf formatted string raise an ArgumentError:
$ ruby19 -vwe 'File.new("foo" % [])'
ruby 1.9.0 (2008-06-18 revision 15873) [i686-darwin9.3.0]
-e:1:in `initialize': transcoding not supported (from US-ASCII to UTF8-MAC) (ArgumentError)
from -e:1:in `new'
from -e:1:in `<main>'
Using just "foo" as the filename works fine:
$ ruby19 -we 'File.new("foo")'
As does String#<<:
$ ruby19 -we 'File.new("foo" << "")'
History
Updated by Anonymous over 3 years ago
I'm not sure why UTF8-MAC was introduced. UTF8-MAC indeed
isn't supported currently for transcoding.
I don't even know what UTF8-MAC is. It is defined as a replica
of UTF-8 in enc/utf_8.c. It is not defined at
http://www.iana.org/assignments/character-sets.
It may be that it is an attempt to refer to the fact that UTF-8
usually is used in decomposed form (NFD) on the Mac. But that
would not be relevant for opening a file, because the Mac OS
accepts any kind of normalization, and converts to NFD by itself
(similar to a file system that accepts both upper- and lower-case,
but internally uses only one case).
Also, the issues of normalization is orthogonal to what kind of
encoding form is used for Unicode, and therefore adding it to
an encoding is something that we should consider much more
carefully. Overall, UTF-8 should be UTF-8, it's a bad idea to
create variants.
Regards, Martin.
At 09:42 08/06/18, Eric Hodel wrote:
>Issue #178 has been reported by Eric Hodel.
>
>----------------------------------------
>Bug #178: File.open on sprintf-formatted string fails with encoding
>conversion error on OS X
>http://redmine.ruby-lang.org/issues/show/178
>
>Author: Eric Hodel
>Status: Open
>Priority: Normal
>Assigned to:
>Category:
>Target version:
>
>
>String#% and File.open are interacting strangely on OS X, so files opened
>with a sprintf formatted string raise an ArgumentError:
>
>$ ruby19 -vwe 'File.new("foo" % [])'
>ruby 1.9.0 (2008-06-18 revision 15873) [i686-darwin9.3.0]
>-e:1:in `initialize': transcoding not supported (from US-ASCII to UTF8-MAC)
>(ArgumentError)
> from -e:1:in `new'
> from -e:1:in `<main>'
>
>Using just "foo" as the filename works fine:
>
>$ ruby19 -we 'File.new("foo")'
>
>As does String#<<:
>
>$ ruby19 -we 'File.new("foo" << "")'
>
>
>----------------------------------------
>You have received this notification because you have either subscribed to
>it, or are involved in it.
>To change your notification preferences, please click here:
>http://redmine.ruby-lang.org/my/account
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Updated by Yui NARUSE over 3 years ago
- Status changed from Open to Closed
- Assignee set to Yui NARUSE
- % Done changed from 0 to 100
This problem is from the same bug of Bug #179, and it was fixed at r17403. Thanks, > It may be that it is an attempt to refer to the fact that UTF-8 > usually is used in decomposed form (NFD) on the Mac. But that > would not be relevant for opening a file, because the Mac OS > accepts any kind of normalization, and converts to NFD by itself > (similar to a file system that accepts both upper- and lower-case, > but internally uses only one case). Yeah, that's true when you write to filesystem, but when you read from filesystem you may want to know whether they are composed or decomposed.
Updated by Anonymous over 3 years ago
At 15:36 08/06/18, Yui NARUSE wrote: >Issue #178 has been updated by Yui NARUSE. > >Status changed from Open to Closed >Assigned to set to Yui NARUSE >% Done changed from 0 to 100 > >This problem is from the same bug of Bug #179, >and it was fixed at r17403. >Thanks, Great, thanks! >> It may be that it is an attempt to refer to the fact that UTF-8 >> usually is used in decomposed form (NFD) on the Mac. But that >> would not be relevant for opening a file, because the Mac OS >> accepts any kind of normalization, and converts to NFD by itself >> (similar to a file system that accepts both upper- and lower-case, >> but internally uses only one case). > >Yeah, that's true when you write to filesystem, >but when you read from filesystem you may want to know >whether they are composed or decomposed. That may indeed be the case. But this really only applies to filenames (and maybe similar names of resources) on the Mac. For such a small subset of data, I think it's overkill if as a consequence, processing together with other data is blocked (as we saw in the bug report). As far as I understand, it doesn't apply to file contents or other data on the Mac. Also, as soon as you concatenate two strings, there is no guarantee that NFD is kept (unless of course you implement separate string concatenation for this specific encoding). In general, the best thing to do if you want to know is to check, and the best thing if you want to be sure is to check, and then to change if necessary. But we still have to implement this (maybe for -3?). [Also, if the meaning of UTF8-MAC is really NFD, it might be better to actually call it that way.] Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Updated by Yui NARUSE over 3 years ago
Martin Duerst wrote: >>> It may be that it is an attempt to refer to the fact that UTF-8 >>> usually is used in decomposed form (NFD) on the Mac. But that >>> would not be relevant for opening a file, because the Mac OS >>> accepts any kind of normalization, and converts to NFD by itself >>> (similar to a file system that accepts both upper- and lower-case, >>> but internally uses only one case). >> Yeah, that's true when you write to filesystem, >> but when you read from filesystem you may want to know >> whether they are composed or decomposed. > > That may indeed be the case. But this really only applies to > filenames (and maybe similar names of resources) on the Mac. > For such a small subset of data, I think it's overkill if > as a consequence, processing together with other data is > blocked (as we saw in the bug report). This bug is derived from other point. > As far as I understand, it doesn't apply to file contents or > other data on the Mac. Also, as soon as you concatenate two > strings, there is no guarantee that NFD is kept (unless of > course you implement separate string concatenation for this > specific encoding). In general, the best thing to do if you > want to know is to check, and the best thing if you want to > be sure is to check, and then to change if necessary. But we > still have to implement this (maybe for -3?). Off cource, the encoding of other data on the mac may be other than UTF8-MAC: that's may be composed UTF-8. I intend that strings labeld as UTF8-MAC may needed to be converted or normalized. If you don't care about it, you can use force_encoding. > [Also, if the meaning of UTF8-MAC is really NFD, it might > be better to actually call it that way.] not real NFD, Apple's NFD as I commented in enc/utf_8.c. -- NARUSE, Yui <naruse@airemix.jp>