Bug #956

Encoding: nl_langinfo(CODESET) on cygwin 1.5 always returns US-ASCII

Added by Tom Link about 3 years ago. Updated 10 months ago.

[ruby-core:20994]
Status:Closed Start date:12/31/2008
Priority:Normal Due date:
Assignee:- % Done:

100%

Category:-
Target version:1.9.1 RC2
ruby -v:

Description

It seems you cannot rely on nl_langinfo(CODESET) to return the proper charset on cygwin as it appears to always return
US-ASCII no matter what.

IMHO the configure script should not only check for the availability of langinfo but also for its functionality as it
seems to currently be a dummy function under cygwin.

Please see also http://groups.google.com/group/comp.lang.ruby/msg/42d92ae740d12a5f?hl=en

Associated revisions

Revision 21311
Added by Nobuyoshi Nakada about 3 years ago

* encoding.c (rb_filesystem_encoding, rb_locale_charmap): uses codepage on cygwin. [ruby-core:20994]

History

Updated by Martin Dürst about 3 years ago

I can confirm that this problem happens. Adding a

#elif defined(__CYGWIN__)

option as the second choice in rb_locale_charmap in encoding.c should be a good start.
For the actual functionality, I think the best choice is
http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c
There is also http://www.haible.de/bruno/packages-libcharset.html,
but that's GNU, so it would create a copyright problem.

I guess the next steps would be to add the above langinfo.c to
the missing directory, probably changing the function name to
avoid conflicts with the existing (but useless) nl_langinfo.

I could easily do that, but I'd need some advice or help re.
makefiles. Nobu, Yui, anybody?

Regards,    Martin.

Updated by Yuki Sonoda about 3 years ago

  • Target version set to 1.9.1 RC2

Updated by Nobuyoshi Nakada about 3 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
Applied in changeset r21311.

Updated by Martin Dürst about 3 years ago

  • Status changed from Closed to Open
The patch committed by Nobu uses the Windows 'locale' for cygwin,
which is a good idea as a fallback. However, I personally often
use cygwin with LANG=en-US.UTF-8 or so. Using putty (or another
UTF-8 capable terminal emulator such as TeraTerm,...) and cygwin
is often the only way to do UTF-8 work on Windows.

I'm not sure what Tom Link meant with "proper charset", but
for me, it would be UTF-8 if I have set LANG=en-US.UTF-8.

Regards,   Martin.

Updated by Tom Link about 3 years ago

> proper charset

I'm fine with any solution that makes something 8-bit clean the default charset.

People using cygwin's x server though can run cygwin's utf-8-capable version of rxvt. In such a case, it could cause problems if ruby relied on the windows locale.

A proper solution should IMHO check for LANG first and use the windows locale only if LANG isn't defined -- as proposed by Martin.

Anyway, I haven't tried it yet but I guess the current solution is ok for me since I personally use the non-utf-8 windows rxvt terminal. Thanks.

Updated by Nobuyoshi Nakada about 3 years ago

Hi,

At Sat, 10 Jan 2009 02:35:50 +0900,
Tom Link wrote in [ruby-core:21239]:
> A proper solution should IMHO check for LANG first and use
> the windows locale only if LANG isn't defined -- as proposed
> by Martin.

It's working so.

-- 
Nobu Nakada

Updated by Martin Dürst about 3 years ago

At 03:11 09/01/13, you wrote:
>Hi,
>
>At Sat, 10 Jan 2009 02:35:50 +0900,
>Tom Link wrote in [ruby-core:21239]:
>> A proper solution should IMHO check for LANG first and use
>> the windows locale only if LANG isn't defined -- as proposed
>> by Martin.
>
>It's working so.

That's not true. Currently, Encoding.default_external defaults
to US-ASCII if LANG is not set on cygwin, not to the windows
locale encoding.

We can leave it at that, or we can fix it.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Updated by Nobuyoshi Nakada about 3 years ago

Hi,

At Wed, 14 Jan 2009 18:11:36 +0900,
Martin Duerst wrote in [ruby-core:21341]:
> >At Sat, 10 Jan 2009 02:35:50 +0900,
> >Tom Link wrote in [ruby-core:21239]:
> >> A proper solution should IMHO check for LANG first and use
> >> the windows locale only if LANG isn't defined -- as proposed
> >> by Martin.
> >
> >It's working so.
> 
> That's not true. Currently, Encoding.default_external defaults
> to US-ASCII if LANG is not set on cygwin, not to the windows
> locale encoding.

Sorry, I'd missed to commit it.

-- 
Nobu Nakada

Updated by Martin Dürst about 3 years ago

Hello Nobu,

Many thanks for fixing it. I'm going to add some text from
missing/langinfo.c to LICENSE (anybody, please tell me if
that was wrong), and inform the author about the changes we
made, and close the bug.

Regards,   Martin.

At 11:03 09/01/15, Nobuyoshi Nakada wrote:
>Hi,
>
>At Wed, 14 Jan 2009 18:11:36 +0900,
>Martin Duerst wrote in [ruby-core:21341]:
>> >At Sat, 10 Jan 2009 02:35:50 +0900,
>> >Tom Link wrote in [ruby-core:21239]:
>> >> A proper solution should IMHO check for LANG first and use
>> >> the windows locale only if LANG isn't defined -- as proposed
>> >> by Martin.
>> >
>> >It's working so.
>> 
>> That's not true. Currently, Encoding.default_external defaults
>> to US-ASCII if LANG is not set on cygwin, not to the windows
>> locale encoding.
>
>Sorry, I'd missed to commit it.
>
>-- 
>Nobu Nakada


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Updated by Martin Dürst about 3 years ago

  • Status changed from Open to Closed

Updated by Tom Link about 3 years ago

It seems that the locale recognition doesn't work 100% or maybe I'm just doing it wrong.

On cygwin, the default external encoding is cp850. If I set LANG=de_DE.UTF-8, then

rube -e "Encoding.default_external"
=> UTF-8

gives the correct value. But if I set it to LANG=de_DE.ISO-8859-1, then

rube -e "Encoding.default_external"
=> CP850

returns the windows default locale. Since CP850 und ISO-8859-1 are incompatible encodings in the ruby mind-set, this is an unpleasant discovery.

Also available in: Atom PDF