https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?17113305112017-07-26T12:46:37ZRuby Issue Tracking SystemRuby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659352017-07-26T12:46:37Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>Constant names must start with an upper case in <strong>ASCII</strong>.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659372017-07-26T14:25:43Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.</p>
<p>Matz.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659452017-07-26T17:32:15Zshevegen (Robert A. Heiler)shevegen@gmail.com
<ul></ul><p>Martin Dürst could then create classes for all Emojis in Unicode. :D</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659492017-07-27T00:40:56Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>matz (Yukihiro Matsumoto) wrote:</p>
<blockquote>
<p>And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.</p>
</blockquote>
<p>What do you think of Titlecase? Are they allowed?</p>
<p><a href="http://unicode.org/faq/casemap_charprop.html#4" class="external">http://unicode.org/faq/casemap_charprop.html#4</a></p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659592017-07-27T06:19:17Zphluid61 (Matthew Kerwin)matthew@kerwin.net.au
<ul></ul><p>shyouhei (Shyouhei Urabe) wrote:</p>
<blockquote>
<p>matz (Yukihiro Matsumoto) wrote:</p>
<blockquote>
<p>And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.</p>
</blockquote>
<p>What do you think of Titlecase? Are they allowed?</p>
<p><a href="http://unicode.org/faq/casemap_charprop.html#4" class="external">http://unicode.org/faq/casemap_charprop.html#4</a></p>
</blockquote>
<p>Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659602017-07-27T06:27:45Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>phluid61 (Matthew Kerwin) wrote:</p>
<blockquote>
<p>shyouhei (Shyouhei Urabe) wrote:</p>
<blockquote>
<p>matz (Yukihiro Matsumoto) wrote:</p>
<blockquote>
<p>And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.</p>
</blockquote>
<p>What do you think of Titlecase? Are they allowed?</p>
<p><a href="http://unicode.org/faq/casemap_charprop.html#4" class="external">http://unicode.org/faq/casemap_charprop.html#4</a></p>
</blockquote>
<p>Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.</p>
</blockquote>
<p>Can I ask you whether U+01C8 is a valid Constant name or not in your opinion? and why?</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659612017-07-27T06:54:41Zphluid61 (Matthew Kerwin)matthew@kerwin.net.au
<ul></ul><p>shyouhei (Shyouhei Urabe) wrote:</p>
<blockquote>
<p>phluid61 (Matthew Kerwin) wrote:</p>
<blockquote>
<p>Isn't titlecase a mapping property, rather than an attribute? That is, how a character would be converted to titlecase is orthogonal to whether it's uppercase.</p>
</blockquote>
<p>Can I ask you whether U+01C8 is a valid Constant name or not in your opinion? and why?</p>
</blockquote>
<p>Oh, you're right, I had misread the documentation.</p>
<p>I think that if Ruby accepts all <em>Lu</em> characters as constants, it could also accept all <em>Lt</em>. In the case of U+01C8 I'm not overly concerned because it's not common any more (but I think <code>Ljudevit</code> is just as valid as <code>Ljudevit</code>); however for U+01F2 it could be reasonable for someone in Macedonia to name a constant <code>Dze</code>, for example.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659622017-07-27T07:38:43Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>OK, I see. Thank you.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659692017-07-28T10:22:46Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>shevegen (Robert A. Heiler) wrote:</p>
<blockquote>
<p>Martin Dürst could then create classes for all Emojis in Unicode. :D</p>
</blockquote>
<p>Well, it's unclear whether emoji (note the Japanese plural!) are upper-case or lower-case.<br>
I thought maybe we could make a distinction between children (lower-case) and adults (upper-case), but there are not many children, tons of adults, and tons of other stuff (not to say gunk).</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659702017-07-28T10:49:49Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>matz (Yukihiro Matsumoto) wrote:</p>
<blockquote>
<p>And maybe it's time to relax the limitation for Non-ASCII capital letters to start constant names.</p>
</blockquote>
<p>I agree. Here are some pointers for implementation:</p>
<p>The distinction between constants (<code>tCONSTANT</code>) and identifiers (<code>tIDENTIFIER</code>) is made at <code>parse.c:7830</code> using macro <code>ISUPPER</code>. Some other uses of <code>ISUPPER</code> (but not all of them) seem to be related to this distinction, e.g. the one at <code>symbol.c:281</code>.</p>
<p><code>ISUPPER</code> is defined using <code>rb_isupper</code> in <code>include/ruby/ruby.h</code>, the later being defined inline in the same file, as <code>'A' <= c && c <= 'Z'</code>. This would have to be replaced with a call to <code>ONIGENC_IS_CODE_CTYPE</code> or so, which would work for legacy encodings. For Unicode-based encodings, where we want to into account titlecase (thanks, Shyouhei!), it may be slightly more complicated.</p>
<p>A question we might want to check for is if there's any code out there that currently uses non-ASCII upper-case variable names.</p>
<p>Another question is whether we might want to have some convention for Japanese, e.g. Katakana for class names. Just thinking out loud (and ducking).</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659722017-07-29T00:17:23Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>To distinguish non-ASCII upper/lower cases would lead non-ASCII punctuations too.<br>
ASCII punctuations cannot be a part of identifiers, will non-ASCII versions be same?</p>
<p>BTW, I think Japanese has no or little concept of plural, except that some words imply "many" and some suffixes.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659832017-07-31T11:18:10Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>I'm uncertain about the usage of <code>mbc_case_fold</code>.</p>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gh">diff --git i/parse.y w/parse.y
index 02d9412a2c..96f25d893e 100644
</span><span class="gd">--- i/parse.y
</span><span class="gi">+++ w/parse.y
</span><span class="p">@@ -7790,6 +7790,8 @@</span> parse_atmark(struct parser_params *parser, const enum lex_state_e last_state)
return result;
}
<span class="gi">+int rb_enc_const_id_char_p(const char *name, const char *end, rb_encoding *enc);
+
</span> static enum yytokentype
parse_ident(struct parser_params *parser, int c, int cmd_state)
{
<span class="p">@@ -7827,7 +7829,9 @@</span> parse_ident(struct parser_params *parser, int c, int cmd_state)
pushback(c);
}
}
<span class="gd">- if (result == 0 && ISUPPER(tok()[0])) {
</span><span class="gi">+ if (result == 0 &&
+ (ISUPPER(tok()[0]) ||
+ rb_enc_const_id_char_p(tok(), tok()+toklen(), current_enc))) {
</span> result = tCONSTANT;
}
else {
<span class="gh">diff --git i/symbol.c w/symbol.c
index f4516ebbe4..490cae0127 100644
</span><span class="gd">--- i/symbol.c
</span><span class="gi">+++ w/symbol.c
</span><span class="p">@@ -198,6 +198,28 @@</span> rb_enc_symname_p(const char *name, rb_encoding *enc)
return rb_enc_symname2_p(name, strlen(name), enc);
}
<span class="gi">+int
+rb_enc_const_id_char_p(const char *name, const char *end, rb_encoding *enc)
+{
+ int c, len;
+
+ if (end <= name) return FALSE;
+ if (ISASCII(*name)) return ISUPPER(*name);
+ c = rb_enc_codepoint_len(name, end, &len, enc);
+ if (c < 0) return FALSE;
+ if (rb_enc_isupper(c, enc)) return TRUE;
+ {
+ OnigUChar fold[ONIGENC_GET_CASE_FOLD_CODES_MAX_NUM];
+ const OnigUChar *beg = (const OnigUChar *)name;
+ int r = enc->mbc_case_fold(ONIGENC_CASE_FOLD,
+ &beg, (const OnigUChar *)end,
+ fold, enc);
+ if (r > 0 && (r != len || memcmp(fold, name, r)))
+ return TRUE;
+ }
+ return FALSE;
+}
+
</span> #define IDSET_ATTRSET_FOR_SYNTAX ((1U<<ID_LOCAL)|(1U<<ID_CONST))
#define IDSET_ATTRSET_FOR_INTERN (~(~0U<<(1<<ID_SCOPE_SHIFT)) & ~(1U<<ID_ATTRSET))
<span class="p">@@ -278,7 +300,7 @@</span> rb_enc_symname_type(const char *name, long len, rb_encoding *enc, unsigned int a
break;
default:
<span class="gd">- type = ISUPPER(*m) ? ID_CONST : ID_LOCAL;
</span><span class="gi">+ type = rb_enc_const_id_char_p(m, e, enc) ? ID_CONST : ID_LOCAL;
</span> id:
if (m >= e || (*m != '_' && !ISALPHA(*m) && ISASCII(*m))) {
if (len > 1 && *(e-1) == '=') {
</code></pre> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=659972017-08-01T11:21:39Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>ruby -v</strong> deleted (<del><i>2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]</i></del>)</li><li><strong>Backport</strong> deleted (<del><i>2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN</i></del>)</li><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Feature</i></li></ul> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=662292017-08-18T12:41:24Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>matz (Yukihiro Matsumoto)</i></li></ul> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=663982017-08-31T09:06:18Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>In the patch, I suggest adding something like</p>
<pre><code class="c syntaxhl" data-language="c"><span class="k">if</span> <span class="p">(</span><span class="n">rb_enc_islower</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="k">return</span> <span class="n">FALSE</span><span class="p">;</span>
</code></pre>
<p>immediately before or after</p>
<pre><code class="c syntaxhl" data-language="c"><span class="k">if</span> <span class="p">(</span><span class="n">rb_enc_isupper</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="k">return</span> <span class="n">TRUE</span><span class="p">;</span>
</code></pre> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665012017-09-06T09:45:53Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>I have checked for upper-case letters without corresponding lower-case letters, with the following short script:</p>
<pre><code> ruby -n -e 'l=$_.split(/;/); if l[2]=="Lu" && l[13]=="" then puts l[1];end' <UnicodeData.txt
</code></pre>
<p>Somewhat contrary to my expectations, this turned up quite a number of characters (471 of them). Most are MATHEMATICAL symbols in the range U+1D400 to U+1D7FF. My understanding is that they don't have mappings because mathematicians use upper-case and lower-case symbols with different meanings.</p>
<p>There are some other upper-case characters without defined lower-case equivalents, but most of the correspond to empty slots in the MATHEMATICAL symbols charts.</p>
<p>The above patch would treat all identifiers starting with upper-case, even MATHEMATICAL symbols, as class names. Unless we want to forbid such characters in identifiers, I think that's the right thing to do.</p>
<p>What's more important for the above patch is that there are no title-case characters without lower-case mappings, so</p>
<pre><code class="diff syntaxhl" data-language="diff"><span class="gi">+ if (r > 0 && (r != len || memcmp(fold, name, r)))
+ return TRUE;
+ }
</span></code></pre>
<p>in the patch will work correctly.</p>
<p>What is more important for the above patch is whether</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665022017-09-06T09:54:07Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>sb (Sergey Borodanov) wrote:</p>
<blockquote>
<p>Same error with module creating and same behavior in <strong>irb</strong> (please, see attachment). At the same time Cyrillic-named constants and methods work fine.</p>
</blockquote>
<p>Methods indeed should work fine, because currently all non-ASCII characters are lumped together as lower-case. But I don't think constants work fine; it may only look so.</p>
<p>Please try e.g.</p>
<pre><code>ruby -e 'Мир = 55; Мир = 77'
</code></pre>
<p>You should get a warning saying the the constant was already initialized. I don't get such a warning, which means that <code>Мир</code> here is treated as a variable, not as a constant.</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665032017-09-06T10:02:02Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-6 priority-4 priority-default closed" href="/issues/11859">Bug #11859</a>: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work.</i> added</li></ul> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665052017-09-06T10:04:44Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>As mentioned at the last committers' meeting, I think the patch will not work e.g. for upper-case characters in three-byte EUC-JP (characters from JIS X 0212) because the necessary data isn't there (see <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: Regexp matching with \p{Upper} and \p{Lower} for EUC-JP doesn’t work. (Rejected)" href="https://bugs.ruby-lang.org/issues/11859">#11859</a>).</p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665372017-09-07T12:12:10Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>duerst (Martin Dürst) wrote:</p>
<blockquote>
<p>In the patch, I suggest adding something like</p>
<pre><code class="c syntaxhl" data-language="c"><span class="k">if</span> <span class="p">(</span><span class="n">rb_enc_islower</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="k">return</span> <span class="n">FALSE</span><span class="p">;</span>
</code></pre>
<p>immediately before or after</p>
<pre><code class="c syntaxhl" data-language="c"><span class="k">if</span> <span class="p">(</span><span class="n">rb_enc_isupper</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="k">return</span> <span class="n">TRUE</span><span class="p">;</span>
</code></pre>
</blockquote>
<p>I changed these code as followings:</p>
<pre><code class="c syntaxhl" data-language="c"> <span class="k">if</span> <span class="p">(</span><span class="n">rb_enc_isalpha</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="p">{</span>
<span class="cm">/* non-lower case alphabet should be upper/title case */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">rb_enc_islower</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">enc</span><span class="p">))</span> <span class="k">return</span> <span class="n">TRUE</span><span class="p">;</span>
<span class="p">}</span>
</code></pre> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=665382017-09-07T12:13:48Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul></ul><p>The whole patch is <a href="https://github.com/nobu/ruby/tree/feature/13770-nonascii-const-name" class="external">https://github.com/nobu/ruby/tree/feature/13770-nonascii-const-name</a></p> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=714342018-04-10T00:41:54Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li></ul><p>Applied in changeset trunk|r63130.</p>
<hr>
<p>symbol.c: non-ASCII constant names</p>
<ul>
<li>
<p>symbol.c (rb_sym_constant_char_p): support for non-ASCII<br>
constant names. [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Can't create valid Cyrillic-named class/module (Closed)" href="https://bugs.ruby-lang.org/issues/13770">#13770</a>]</p>
</li>
<li>
<p>object.c (rb_mod_const_get, rb_mod_const_defined): support for<br>
non-ASCII constant names.</p>
</li>
</ul> Ruby master - Feature #13770: Can't create valid Cyrillic-named class/modulehttps://bugs.ruby-lang.org/issues/13770?journal_id=762392019-01-11T08:53:36Znobu (Nobuyoshi Nakada)nobu@ruby-lang.org
<ul><li><strong>Has duplicate</strong> <i><a class="issue tracker-1 status-6 priority-4 priority-default closed" href="/issues/15524">Bug #15524</a>: Unicode not Supported in Class Names</i> added</li></ul>