Project

General

Profile

Feature #12272

Accepting HTML entity name in string literal

Added by sawa (Tsuyoshi Sawada) over 3 years ago. Updated over 3 years ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:74891]

Description

String literal allows the escape character \u to describe a character using UTF-8 character code like this:

"\u201c" # left double quote
"\u2191" # up arrow

This is useful in typing characters that are not easy to input from the keyboard. However, normal people do not memorize the UTF-8 codes by heart.

The HTML symbol entity name is the place where we can compromise (although it is not available for the entire UTF-8), I think. I would like the string literal to be extended to accept HTML entity names and interpret them as the corresponding UTF-8 characters. I do not have a definite idea for the syntax, but a candidate can be an escape character \& ... ;, so that we can type:

"\“" # left double quote
"\↑"  # up arrow

Currently, "\&" is interpreted as "&", so this will be a compatibility breaking change, and if that is not desirable, perhaps a different syntax may be considered.

History

Updated by shevegen (Robert A. Heiler) over 3 years ago

I don't know. I am not really against it, also not really for it but to be honest, I can't remember either of these variants anyway. :)

Would this actually be used/usable?

Updated by sawa (Tsuyoshi Sawada) over 3 years ago

Robert A. Heiler wrote:

I don't know. I am not really against it, also not really for it but to be honest, I can't remember either of these variants anyway. :)

Would this actually be used/usable?

Ideally, I would prefer LaTeX math symbol commands, which I believe many people have wider handle on. But mapping between LaTeX commands and UTF-8 would not be trivial, and would have to undergo intensive discussion on what to include and what not as well as what maps to what, which gives me less hope that such thing will ever converge and will be accepted by Matz. On the contrary, UTF-8 code is already accepted in Ruby string literal, and since the HTML entity names are clearly defined, I think the feature details would be straightforward once it is decided.

People up to this date may be living with using straight quotes instead of smart quotes, or using ASCII art to describe arrows, tables, or other characters/constructs. But things are evolving. We can expect things to become better and more pleasing. We don't need to stick to ASCII characters forever.

Updated by duerst (Martin Dürst) over 3 years ago

Tsuyoshi Sawada wrote:

Ideally, I would prefer LaTeX math symbol commands, which I believe many people have wider handle on.

I think there are strong individual differences. Therefore, it doesn't make that much sense to implement such syntax in Ruby itself. For most characters, it's much easier to read if the character itself is used directly.

What I do for some characters is to register them in the IME (input method editor) for Japanese input. For example, I can switch to Japanese input, type 'ue', and select ü or Ü in addition to the already registered things such as 上, 植え, 飢え, and so on. For those who don't use Japanese, Chinese, or another language that requires an IME, most advanced editors have macro facilities or similar that can be used.

Also available in: Atom PDF