Project

General

Profile

Feature #16006

String count and alignment that consider multibyte characters

Added by sawa (Tsuyoshi Sawada) 2 months ago. Updated 2 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:<unknown>]

Description

In non-proportional font, multibyte characters have twice the width of ASCII characters. Since String#length, String#ljust, String#rjust, and String#center do not take this into consideration, applying these methods do not give the desired output.

array = ["aaあああ", "bいいいいいいいい", "cc"]

col_width = array.max(&:length)
array.each{|w| puts w.ljust(col_width, "*")}

# >> aaあああ****
# >> bいいいいいいいい
# >> cc*******

In order to do justification of strings that have multi-byte characters, we have to do something much more complicated such as the following:

col_widths =
  array.to_h{|w| [
    w,
    w
    .chars
    .partition(&:ascii_only?)
    .then{|ascii, non| ascii.length + (non.length * 2)}
  ]}
col_width = col_widths.values.max
array.each{|w| puts w + "*" * (col_width - col_widths[w])}

#  Note that the following gives the desired alignment in non-proportional font, but may not appear so in this issue tracker.
# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************

This issue seems to be common, as several webpages can be found that attempt to do something similar.

I propose to give the relevant methods an option to take multibyte characters into consideration. Perhaps something like the proportional keyword in the following may work:

"aaあああ".length(proportional: true) # => 8
"aaあああ".ljust(17, "*", proportional: true) # => "aaあああ*********"

Then, the desired output would be given by this code:

col_width = array.max{|w| w.length(proportional: true)}
array.each{|w| puts w.ljust(col_width, "*", proportional: true)}

# >> aaあああ*********
# >> bいいいいいいいい
# >> cc***************

Related issues

Is duplicate of Ruby master - Feature #14618: Add display width method to String for CLIOpenActions

History

#1

Updated by sawa (Tsuyoshi Sawada) 2 months ago

  • Description updated (diff)
#2

Updated by sawa (Tsuyoshi Sawada) 2 months ago

  • Description updated (diff)

Updated by shyouhei (Shyouhei Urabe) 2 months ago

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

#4

Updated by shyouhei (Shyouhei Urabe) 2 months ago

  • Is duplicate of Feature #14618: Add display width method to String for CLI added

Updated by sawa (Tsuyoshi Sawada) 2 months ago

shyouhei (Shyouhei Urabe) wrote:

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

Yeah, the keyword name non_ascii In my original proposal was not good. It would make things complicated, and was too specific, as shyouhei (Shyouhei Urabe) has addressed.

I updated my proposal to have the keyword proportional. I expect all the width to be handled automatically including non-wide non-ASCII letters.

Updated by shyouhei (Shyouhei Urabe) 2 months ago

sawa (Tsuyoshi Sawada) wrote:

shyouhei (Shyouhei Urabe) wrote:

This particular proposal is NG. ASCII vs. non-ASCII is too Asian-centric. There are other non-wide non-ASCII encodings, such as those in Europe.

Yeah, the keyword name non_ascii In my original proposal was not good. It would make things complicated, and was too specific, as shyouhei (Shyouhei Urabe) has addressed.

I updated my proposal to have the keyword proportional. I expect all the width to be handled automatically including non-wide non-ASCII letters.

Still not appropriate. There are characters whose "wide"-ness is not fixed until they actually got rendered. See also: https://unicode.org/reports/tr11/ especially the section named "Modern Rendering Practice".

Updated by matz (Yukihiro Matsumoto) 2 months ago

  • Status changed from Open to Rejected

The display width of a string cannot be calculated without rendering information, which Ruby usually does not have.
Considering emojis or grapheme clusters, it is nearly impossible. It's the responsibility of the rendering engine.

Matz.

Also available in: Atom PDF