Unless MRI has some non-standard definition of the term "codepoint", your second statement is incorrect. In US-ASCII, the codepoint 0x80 does not exist.
IMO, any operation that attempts to produce a US-ASCII string containing 0x80 should either fail (like "\u{80}".encode("US-ASCII")) or promote to ASCII-8BIT (like "".encode("US-ASCII") << 128.chr). So I believe the middle two examples are incorrect.
IMO, any operation that attempts to produce a US-ASCII string containing 0x80 should either fail (like "\u{80}".encode("US-ASCII")) or
promote to ASCII-8BIT (like "".encode("US-ASCII") << 128.chr). So I believe the middle two examples are incorrect.
I would prefer not, as then String#<< with an Integer ((|i|)) can't be defined as (({self << i.chr(self.encoding)})).
I think it would make much more sense for (({"".encode("US-ASCII") << 128})) and (({128.chr("US-ASCII")})) both to raise RangeError. The current behavior is just weird:
a = "".encode("US-ASCII") << 128
b = 128.chr("US-ASCII")
a == b #=> true
a.valid_encoding? #=> true
b.valid_encoding? #=> false
This issue was solved with changeset r34236.
John, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
numeric.c (rb_enc_uint_char): raise RangeError when added codepoint
is invalid. [Feature #5855] [Bug #5863] [Bug #5864]
string.c (rb_str_concat): ditto.
string.c (rb_str_concat): set encoding as ASCII-8BIT when the string
is US-ASCII and the argument is an integer greater than 127.