Project

General

Profile

Actions

Bug #16842

open

`inspect` prints the UTF-8 character U+0085 (NEXT LINE) verbatim even though it is not printable

Added by sawa (Tsuyoshi Sawada) over 1 year ago. Updated 8 months ago.

Status:
Assigned
Priority:
Normal
Target version:
-
ruby -v:
ruby 2.8.0dev (2020-05-09T13:24:57Z master 889b0fe46f) [x86_64-linux]
[ruby-core:98231]

Description

The UTF-8 character U+0085 (NEXT LINE) is not printable, but inspect prints the character verbatim (within double quotation):

0x85.chr(Encoding::UTF_8).match?(/\p{print}/) # => false
0x85.chr(Encoding::UTF_8).inspect
#=> "\"
\""

My understanding is that non-printable characters are not printed verbatim with inspect:

"\n".match?(/\p{print}/) # => false
"\n".inspect #=> "\"\\n\""

while printable characters are:

"a".match?(/\p{print}/) # => true
"a".inspect # => "\"a\""

I ran the following script, and found that U+0085 is the only character within the range U+0000 to U+FFFF that behaves like this.

def verbatim?(char)
  !char.inspect.start_with?(%r{\"\\[a-z]})
end

def printable?(char)
  char.match?(/\p{print}/)
end

(0x0000..0xffff).each do |i|
  begin
    char = i.chr(Encoding::UTF_8)
  rescue RangeError
    next
  end
  puts '%#x' % i unless verbatim?(char) == printable?(char)
end

Updated by jeremyevans0 (Jeremy Evans) 8 months ago

  • Assignee set to duerst (Martin Dürst)
  • Status changed from Open to Assigned

Behavior here seems to be dependent on the encoding:

$ LC_ALL=C ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b"
"\"\\u0085\""

$ LC_ALL=en_US.UTF-8 ruby -e "p 0x85.chr(Encoding::UTF_8).inspect.b"
"\"\xC2\x85\""

I've submitted a pull request to fix the behavior, though the implementation is rather crude: https://github.com/ruby/ruby/pull/4229

duerst (Martin Dürst) Is there a better fix by handling the unicode properties differently?

Updated by naruse (Yui NARUSE) 8 months ago

Why U+0085 is categorized as Print in Ruby is historically Oniguruma treats as that.
https://moriyoshi.hatenablog.com/entry/20090307/1236410006

I'm neutral about the change, but I want the change should have detailed comment or link to this ticket.

Actions

Also available in: Atom PDF