Feature #13770: Can't create valid Cyrillic-named class/module - Ruby - Ruby Issue Tracking System

Custom queries

Backport 3.2
Backport 3.3
Backport 3.4
Backport 4.0
bugs: unassigned
DevMeeting
matz
Open issues with attachment
Windows

Actions

Copy link

Feature #13770

closed

Can't create valid Cyrillic-named class/module

Feature #13770: Can't create valid Cyrillic-named class/module

Added by sb (Sergey Borodanov) over 8 years ago. Updated almost 8 years ago.

Status:

Closed

Assignee:

matz (Yukihiro Matsumoto)

Target version:

[ruby-core:82175]

Description

Can't create a valid Cyrillic-named class. If I have a file мир.rb:

# Content of мир.rb:

class Мир

  def приветствовать
    "Привет, Мир!"
  end

end

and do in bash:

ruby мир.rb

I get the error (SyntaxError):

мир.rb:1: class/module name must be CONSTANT

Same error with module creating and same behavior in irb (please, see attachment). At the same time Cyrillic-named constants and methods work fine.

It is expected that creating Cyrillic-named class/modules should work without error.

Files

Screenshot from 2017-07-26 19-08-14.png (64.2 KB) Screenshot from 2017-07-26 19-08-14.png

Cyrillic-named class/module bug in irb

sb (Sergey Borodanov), 07/26/2017 12:18 PM

Related issues 2 (0 open — 2 closed)

	Related to Ruby - Bug #11859: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.	Rejected		Actions
	Has duplicate Ruby - Bug #15524: Unicode not Supported in Class Names	Rejected		Actions

Issue # Delay: days Cancel Multiple values allowed (comma separated).

History
Notes
Property changes
Associated revisions

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#1 [ruby-core:82176]

Constant names must start with an upper case in ASCII.

Updated by matz (Yukihiro Matsumoto) over 8 years ago Actions
Copy link
#2 [ruby-core:82178]

And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.

Matz.

Updated by shevegen (Robert A. Heiler) over 8 years ago Actions
Copy link
#3 [ruby-core:82181]

Martin Dürst could then create classes for all Emojis in Unicode. :D

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#4 [ruby-core:82186]

matz (Yukihiro Matsumoto) wrote:

And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.

What do you think of Titlecase? Are they allowed?

http://unicode.org/faq/casemap_charprop.html#4

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#5 [ruby-core:82191]

shyouhei (Shyouhei Urabe) wrote:

matz (Yukihiro Matsumoto) wrote:

And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.

What do you think of Titlecase? Are they allowed?

http://unicode.org/faq/casemap_charprop.html#4

Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#6 [ruby-core:82192]

phluid61 (Matthew Kerwin) wrote:

shyouhei (Shyouhei Urabe) wrote:

matz (Yukihiro Matsumoto) wrote:

And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.

What do you think of Titlecase? Are they allowed?

http://unicode.org/faq/casemap_charprop.html#4

Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.

Can I ask you whether U+01C8 is a valid Constant name or not in your opinion? and why?

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:82193]

shyouhei (Shyouhei Urabe) wrote:

phluid61 (Matthew Kerwin) wrote:

Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.

Can I ask you whether U+01C8 is a valid Constant name or not in your opinion? and why?

Oh, you're right, I had misread the documentation.

I think that if Ruby accepts all Lu characters as constants, it could also accept all Lt. In the case of U+01C8 I'm not overly concerned because it's not common any more (but I think ǈudevit is just as valid as Ljudevit); however for U+01F2 it could be reasonable for someone in Macedonia to name a constant ǲe, for example.

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#8 [ruby-core:82194]

OK, I see. Thank you.

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#9 [ruby-core:82205]

shevegen (Robert A. Heiler) wrote:

Martin Dürst could then create classes for all Emojis in Unicode. :D

Well, it's unclear whether emoji (note the Japanese plural!) are upper-case or lower-case.
I thought maybe we could make a distinction between children (lower-case) and adults (upper-case), but there are not many children, tons of adults, and tons of other stuff (not to say gunk).

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#10 [ruby-core:82206]

matz (Yukihiro Matsumoto) wrote:

And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.

I agree. Here are some pointers for implementation:

The distinction between constants (tCONSTANT) and identifiers (tIDENTIFIER) is made at parse.c:7830 using macro ISUPPER. Some other uses of ISUPPER (but not all of them) seem to be related to this distinction, e.g. the one at symbol.c:281.

ISUPPER is defined using rb_isupper in include/ruby/ruby.h, the later being defined inline in the same file, as 'A' <= c && c <= 'Z'. This would have to be replaced with a call to ONIGENC_IS_CODE_CTYPE or so, which would work for legacy encodings. For Unicode-based encodings, where we want to into account titlecase (thanks, Shyouhei!), it may be slightly more complicated.

A question we might want to check for is if there's any code out there that currently uses non-ASCII upper-case variable names.

Another question is whether we might want to have some convention for Japanese, e.g. Katakana for class names. Just thinking out loud (and ducking).

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#11 [ruby-core:82208]

To distinguish non-ASCII upper/lower cases would lead non-ASCII punctuations too.
ASCII punctuations cannot be a part of identifiers, will non-ASCII versions be same?

BTW, I think Japanese has no or little concept of plural, except that some words imply "many" and some suffixes.

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#12 [ruby-core:82216]

I'm uncertain about the usage of mbc_case_fold.

diff --git i/parse.y w/parse.y
index 02d9412a2c..96f25d893e 100644
--- i/parse.y
+++ w/parse.y
@@ -7790,6 +7790,8 @@ parse_atmark(struct parser_params *parser, const enum lex_state_e last_state)
     return result;
 }
 
+int rb_enc_const_id_char_p(const char *name, const char *end, rb_encoding *enc);
+
 static enum yytokentype
 parse_ident(struct parser_params *parser, int c, int cmd_state)
 {
@@ -7827,7 +7829,9 @@ parse_ident(struct parser_params *parser, int c, int cmd_state)
 		pushback(c);
 	    }
 	}
-	if (result == 0 && ISUPPER(tok()[0])) {
+	if (result == 0 &&
+	    (ISUPPER(tok()[0]) ||
+	     rb_enc_const_id_char_p(tok(), tok()+toklen(), current_enc))) {
 	    result = tCONSTANT;
 	}
 	else {
diff --git i/symbol.c w/symbol.c
index f4516ebbe4..490cae0127 100644
--- i/symbol.c
+++ w/symbol.c
@@ -198,6 +198,28 @@ rb_enc_symname_p(const char *name, rb_encoding *enc)
     return rb_enc_symname2_p(name, strlen(name), enc);
 }
 
+int
+rb_enc_const_id_char_p(const char *name, const char *end, rb_encoding *enc)
+{
+    int c, len;
+
+    if (end <= name) return FALSE;
+    if (ISASCII(*name)) return ISUPPER(*name);
+    c = rb_enc_codepoint_len(name, end, &len, enc);
+    if (c < 0) return FALSE;
+    if (rb_enc_isupper(c, enc)) return TRUE;
+    {
+	OnigUChar fold[ONIGENC_GET_CASE_FOLD_CODES_MAX_NUM];
+	const OnigUChar *beg = (const OnigUChar *)name;
+	int r = enc->mbc_case_fold(ONIGENC_CASE_FOLD,
+				   &beg, (const OnigUChar *)end,
+				   fold, enc);
+	if (r > 0 && (r != len || memcmp(fold, name, r)))
+	    return TRUE;
+    }
+    return FALSE;
+}
+
 #define IDSET_ATTRSET_FOR_SYNTAX ((1U<<ID_LOCAL)|(1U<<ID_CONST))
 #define IDSET_ATTRSET_FOR_INTERN (~(~0U<<(1<<ID_SCOPE_SHIFT)) & ~(1U<<ID_ATTRSET))
 
@@ -278,7 +300,7 @@ rb_enc_symname_type(const char *name, long len, rb_encoding *enc, unsigned int a
 	break;
 
       default:
-	type = ISUPPER(*m) ? ID_CONST : ID_LOCAL;
+	type = rb_enc_const_id_char_p(m, e, enc) ? ID_CONST : ID_LOCAL;
       id:
 	if (m >= e || (*m != '_' && !ISALPHA(*m) && ISASCII(*m))) {
 	    if (len > 1 && *(e-1) == '=') {

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#13

ruby -v deleted (~~2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]~~)
Backport deleted (~~2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN~~)
Tracker changed from Bug to Feature

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#14 [ruby-core:82421]

Status changed from Open to Assigned
Assignee set to matz (Yukihiro Matsumoto)

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#15 [ruby-core:82577]

In the patch, I suggest adding something like

if (rb_enc_islower(c, enc)) return FALSE;

immediately before or after

if (rb_enc_isupper(c, enc)) return TRUE;

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#16 [ruby-core:82671]

I have checked for upper-case letters without corresponding lower-case letters, with the following short script:

 ruby -n -e 'l=$_.split(/;/); if l[2]=="Lu" && l[13]=="" then puts l[1];end' <UnicodeData.txt

Somewhat contrary to my expectations, this turned up quite a number of characters (471 of them). Most are MATHEMATICAL symbols in the range U+1D400 to U+1D7FF. My understanding is that they don't have mappings because mathematicians use upper-case and lower-case symbols with different meanings.

There are some other upper-case characters without defined lower-case equivalents, but most of the correspond to empty slots in the MATHEMATICAL symbols charts.

The above patch would treat all identifiers starting with upper-case, even MATHEMATICAL symbols, as class names. Unless we want to forbid such characters in identifiers, I think that's the right thing to do.

What's more important for the above patch is that there are no title-case characters without lower-case mappings, so

+    if (r > 0 && (r != len || memcmp(fold, name, r)))
+        return TRUE;
+    }

in the patch will work correctly.

What is more important for the above patch is whether

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#17 [ruby-core:82672]

sb (Sergey Borodanov) wrote:

Same error with module creating and same behavior in irb (please, see attachment). At the same time Cyrillic-named constants and methods work fine.

Methods indeed should work fine, because currently all non-ASCII characters are lumped together as lower-case. But I don't think constants work fine; it may only look so.

Please try e.g.

ruby -e 'Мир = 55; Мир = 77'

You should get a warning saying the the constant was already initialized. I don't get such a warning, which means that Мир here is treated as a variable, not as a constant.

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#18

Related to Bug #11859: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work. added

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#19 [ruby-core:82673]

As mentioned at the last committers' meeting, I think the patch will not work e.g. for upper-case characters in three-byte EUC-JP (characters from JIS X 0212) because the necessary data isn't there (see #11859).

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#20 [ruby-core:82702]

duerst (Martin Dürst) wrote:

In the patch, I suggest adding something like
if (rb_enc_islower(c, enc)) return FALSE;
immediately before or after
if (rb_enc_isupper(c, enc)) return TRUE;

I changed these code as followings:

    if (rb_enc_isalpha(c, enc)) {
        /* non-lower case alphabet should be upper/title case */
        if (!rb_enc_islower(c, enc)) return TRUE;
    }

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#21 [ruby-core:82703]

The whole patch is https://github.com/nobu/ruby/tree/feature/13770-nonascii-const-name

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#22

Status changed from Assigned to Closed

Applied in changeset trunk|r63130.

symbol.c: non-ASCII constant names

symbol.c (rb_sym_constant_char_p): support for non-ASCII
constant names. [Feature #13770]
object.c (rb_mod_const_get, rb_mod_const_defined): support for
non-ASCII constant names.

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#23

Has duplicate Bug #15524: Unicode not Supported in Class Names added

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #13770

Can't create valid Cyrillic-named class/module

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#1 [ruby-core:82176]

Updated by matz (Yukihiro Matsumoto) over 8 years ago Actions
Copy link
#2 [ruby-core:82178]

Updated by shevegen (Robert A. Heiler) over 8 years ago Actions
Copy link
#3 [ruby-core:82181]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#4 [ruby-core:82186]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#5 [ruby-core:82191]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#6 [ruby-core:82192]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:82193]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#8 [ruby-core:82194]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#9 [ruby-core:82205]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#10 [ruby-core:82206]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#11 [ruby-core:82208]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#12 [ruby-core:82216]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#13

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#14 [ruby-core:82421]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#15 [ruby-core:82577]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#16 [ruby-core:82671]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#17 [ruby-core:82672]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#18

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#19 [ruby-core:82673]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#20 [ruby-core:82702]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#21 [ruby-core:82703]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#22

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#23

Project

General

Profile

Ruby

Custom queries

Feature #13770

Can't create valid Cyrillic-named class/module

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #1 [ruby-core:82176]

Updated by matz (Yukihiro Matsumoto) over 8 years ago ActionsCopy link #2 [ruby-core:82178]

Updated by shevegen (Robert A. Heiler) over 8 years ago ActionsCopy link #3 [ruby-core:82181]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago ActionsCopy link #4 [ruby-core:82186]

Updated by phluid61 (Matthew Kerwin) over 8 years ago ActionsCopy link #5 [ruby-core:82191]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago ActionsCopy link #6 [ruby-core:82192]

Updated by phluid61 (Matthew Kerwin) over 8 years ago ActionsCopy link #7 [ruby-core:82193]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago ActionsCopy link #8 [ruby-core:82194]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #9 [ruby-core:82205]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #10 [ruby-core:82206]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #11 [ruby-core:82208]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #12 [ruby-core:82216]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #13

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #14 [ruby-core:82421]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #15 [ruby-core:82577]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #16 [ruby-core:82671]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #17 [ruby-core:82672]

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #18

Updated by duerst (Martin Dürst) over 8 years ago ActionsCopy link #19 [ruby-core:82673]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #20 [ruby-core:82702]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago ActionsCopy link #21 [ruby-core:82703]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago ActionsCopy link #22

Updated by nobu (Nobuyoshi Nakada) about 7 years ago ActionsCopy link #23

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#1 [ruby-core:82176]

Updated by matz (Yukihiro Matsumoto) over 8 years ago Actions
Copy link
#2 [ruby-core:82178]

Updated by shevegen (Robert A. Heiler) over 8 years ago Actions
Copy link
#3 [ruby-core:82181]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#4 [ruby-core:82186]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#5 [ruby-core:82191]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#6 [ruby-core:82192]

Updated by phluid61 (Matthew Kerwin) over 8 years ago Actions
Copy link
#7 [ruby-core:82193]

Updated by shyouhei (Shyouhei Urabe) over 8 years ago Actions
Copy link
#8 [ruby-core:82194]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#9 [ruby-core:82205]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#10 [ruby-core:82206]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#11 [ruby-core:82208]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#12 [ruby-core:82216]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#13

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#14 [ruby-core:82421]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#15 [ruby-core:82577]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#16 [ruby-core:82671]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#17 [ruby-core:82672]

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#18

Updated by duerst (Martin Dürst) over 8 years ago Actions
Copy link
#19 [ruby-core:82673]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#20 [ruby-core:82702]

Updated by nobu (Nobuyoshi Nakada) over 8 years ago Actions
Copy link
#21 [ruby-core:82703]

Updated by nobu (Nobuyoshi Nakada) almost 8 years ago Actions
Copy link
#22

Updated by nobu (Nobuyoshi Nakada) about 7 years ago Actions
Copy link
#23