For guessing the possible encodings for a byte stream, there are gems specialized for that purpose like charlock_homes, ucharset and rcharset. They are mostly either a wrapper of LibICU4C or a port of Mozilla's encoding detector.
I gave up with this idea for now because I thought the use cases would not expand as wide as expected and it'd be not enough just to add valid_encoding?(enc) if you got serious about encoding detection. (Sorry usa-san!)
However, since this issue is raised, let me share one good use case for future viewers.
Suppose you have a list of byte arrays which you don't know which encoding they are encoded in, like when you want to guess the encoding of the file names stored in a zip file.
So, if you had String#valid_encoding?(enc) you could achieve it like this without modifying, copying or concatenating strings:
For guessing the possible encodings for a byte stream, there are gems specialized for that purpose like charlock_homes, ucharset and rcharset. They are mostly either a wrapper of LibICU4C or a port of Mozilla's encoding detector.
They also should be faster on long strings, and may use byte/character frequency and other heuristics. And it's clear to the user that this is magic that may fail.
Suppose you have a list of byte arrays which you don't know which encoding they are encoded in, like when you want to guess the encoding of the file names stored in a zip file.
So, if you had String#valid_encoding?(enc) you could achieve it like this without modifying, copying or concatenating strings:
Encoding::ASCII_8BIT will pick up garbage. Encoding::US_ASCII is much better.
Maybe not. You could choose to perform the CAP encoding when the encoding was unknown (ASCII_8BIT), or just use the binary garbage as is if the storage was capable of saving binary file names (like ZFS).
Encoding::ISO_8859_1 is always valid, for all bytes, so ASCII8BIT (or US-ASCII) never get used.
Ah, so true. It’s my bad. Anyway I put ASCII_8BIT as a sentinel so encoding would never be nil, so US_ASCII was not an option.
There are many more encodings, but distinguishing them is difficult/impossible with this method.
I know, but in most cases you have some idea as to what the possible encodings are and it is sufficient to try just a few encodings in such cases. This example was meant to be one of them.
If you need more, a BOM-based encoding detector could be another use case for valid_encoding?(enc), I don't know.
I already named a few gems for serious use, so please don't be so strict about these casual use cases.