Bug #956
Encoding: nl_langinfo(CODESET) on cygwin 1.5 always returns US-ASCII
| Status: | Closed | Start date: | 12/31/2008 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 100% |
|
| Category: | - | |||
| Target version: | 1.9.1 RC2 | |||
| ruby -v: |
Description
It seems you cannot rely on nl_langinfo(CODESET) to return the proper charset on cygwin as it appears to always return US-ASCII no matter what. IMHO the configure script should not only check for the availability of langinfo but also for its functionality as it seems to currently be a dummy function under cygwin. Please see also http://groups.google.com/group/comp.lang.ruby/msg/42d92ae740d12a5f?hl=en
Associated revisions
* encoding.c (rb_filesystem_encoding, rb_locale_charmap): uses
codepage on cygwin. [ruby-core:20994]
History
Updated by Martin Dürst about 3 years ago
I can confirm that this problem happens. Adding a #elif defined(__CYGWIN__) option as the second choice in rb_locale_charmap in encoding.c should be a good start. For the actual functionality, I think the best choice is http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c There is also http://www.haible.de/bruno/packages-libcharset.html, but that's GNU, so it would create a copyright problem. I guess the next steps would be to add the above langinfo.c to the missing directory, probably changing the function name to avoid conflicts with the existing (but useless) nl_langinfo. I could easily do that, but I'd need some advice or help re. makefiles. Nobu, Yui, anybody? Regards, Martin.
Updated by Yuki Sonoda about 3 years ago
- Target version set to 1.9.1 RC2
Updated by Nobuyoshi Nakada about 3 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Applied in changeset r21311.
Updated by Martin Dürst about 3 years ago
- Status changed from Closed to Open
The patch committed by Nobu uses the Windows 'locale' for cygwin, which is a good idea as a fallback. However, I personally often use cygwin with LANG=en-US.UTF-8 or so. Using putty (or another UTF-8 capable terminal emulator such as TeraTerm,...) and cygwin is often the only way to do UTF-8 work on Windows. I'm not sure what Tom Link meant with "proper charset", but for me, it would be UTF-8 if I have set LANG=en-US.UTF-8. Regards, Martin.
Updated by Tom Link about 3 years ago
> proper charset I'm fine with any solution that makes something 8-bit clean the default charset. People using cygwin's x server though can run cygwin's utf-8-capable version of rxvt. In such a case, it could cause problems if ruby relied on the windows locale. A proper solution should IMHO check for LANG first and use the windows locale only if LANG isn't defined -- as proposed by Martin. Anyway, I haven't tried it yet but I guess the current solution is ok for me since I personally use the non-utf-8 windows rxvt terminal. Thanks.
Updated by Nobuyoshi Nakada about 3 years ago
Hi, At Sat, 10 Jan 2009 02:35:50 +0900, Tom Link wrote in [ruby-core:21239]: > A proper solution should IMHO check for LANG first and use > the windows locale only if LANG isn't defined -- as proposed > by Martin. It's working so. -- Nobu Nakada
Updated by Martin Dürst about 3 years ago
At 03:11 09/01/13, you wrote: >Hi, > >At Sat, 10 Jan 2009 02:35:50 +0900, >Tom Link wrote in [ruby-core:21239]: >> A proper solution should IMHO check for LANG first and use >> the windows locale only if LANG isn't defined -- as proposed >> by Martin. > >It's working so. That's not true. Currently, Encoding.default_external defaults to US-ASCII if LANG is not set on cygwin, not to the windows locale encoding. We can leave it at that, or we can fix it. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Updated by Nobuyoshi Nakada about 3 years ago
Hi, At Wed, 14 Jan 2009 18:11:36 +0900, Martin Duerst wrote in [ruby-core:21341]: > >At Sat, 10 Jan 2009 02:35:50 +0900, > >Tom Link wrote in [ruby-core:21239]: > >> A proper solution should IMHO check for LANG first and use > >> the windows locale only if LANG isn't defined -- as proposed > >> by Martin. > > > >It's working so. > > That's not true. Currently, Encoding.default_external defaults > to US-ASCII if LANG is not set on cygwin, not to the windows > locale encoding. Sorry, I'd missed to commit it. -- Nobu Nakada
Updated by Martin Dürst about 3 years ago
Hello Nobu, Many thanks for fixing it. I'm going to add some text from missing/langinfo.c to LICENSE (anybody, please tell me if that was wrong), and inform the author about the changes we made, and close the bug. Regards, Martin. At 11:03 09/01/15, Nobuyoshi Nakada wrote: >Hi, > >At Wed, 14 Jan 2009 18:11:36 +0900, >Martin Duerst wrote in [ruby-core:21341]: >> >At Sat, 10 Jan 2009 02:35:50 +0900, >> >Tom Link wrote in [ruby-core:21239]: >> >> A proper solution should IMHO check for LANG first and use >> >> the windows locale only if LANG isn't defined -- as proposed >> >> by Martin. >> > >> >It's working so. >> >> That's not true. Currently, Encoding.default_external defaults >> to US-ASCII if LANG is not set on cygwin, not to the windows >> locale encoding. > >Sorry, I'd missed to commit it. > >-- >Nobu Nakada #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Updated by Martin Dürst about 3 years ago
- Status changed from Open to Closed
Updated by Tom Link about 3 years ago
It seems that the locale recognition doesn't work 100% or maybe I'm just doing it wrong. On cygwin, the default external encoding is cp850. If I set LANG=de_DE.UTF-8, then rube -e "Encoding.default_external" => UTF-8 gives the correct value. But if I set it to LANG=de_DE.ISO-8859-1, then rube -e "Encoding.default_external" => CP850 returns the windows default locale. Since CP850 und ISO-8859-1 are incompatible encodings in the ruby mind-set, this is an unpleasant discovery.