InfraRuby (InfraRuby Vision) wrote:
Please update the documentation for String#codepoints
too.
That says "This is a shorthand for str.each_codepoint.to_a
".
String#codepoints
does return (Unicode) codepoints for US-ASCII and ISO-8859-1 as those encodings are the basis of Unicode.
Well, yes, and for almost all encodings, the returned values are Unicode code points for the ASCII characters, and for some other encodings, there is a bit more of overlap. I don't think we need to go too much into details.
Maybe add Encoding#unicode_codepoints?
which returns true
for these encodings: US-ASCII, ISO-8859-1, UTF-8, UTF-16(BE|LE), UTF-32(BE|LE).
There are quite a few other cases where behavior of String methods changes depending on the string's Encoding. I think it would be good to have access to this information, but methods with more general names may be needed.
Anyway, to get Unicode codepoints out of an arbitrary string, string.encode('UTF-8').codepoints
will always do the job.
(Also, there's an unrelated change in that revision.)
Yes, thanks for noticing, fixed.