Olivier Lacan wrote:
It's common for OAuth authentication flows to store a destination URI to return to when the handshake process is completed. This URI can be stored without first being processed by a web sever that will encode it in the way Rails does for submitted forms since it's not meant to be processed — that is until it comes back to the origin server.
You keep using the word "URI" to refer to these data objects, and by specification a URI data object cannot contain non-ASCII characters (and even some ASCII characters are forbidden.) If we agree that "haha\nlol"
is a String that cannot be parsed as a URI, we should agree the same for "http://example.org/\u{2713}".force_encoding('UTF-8')
I opened this due to an issue I encountered in an OAuth provider handshake procedure. You could argue that I should be expected to URI.encode
any URI set as a destination query parameter to prevent this issue from occurring, surely.
That's what I'm saying, but from the opposite direction. If you're storing a String, and want to ensure that it encodes a valid URI, you should URI.encode
the parts before storing them in the String.
By analogy, if we replace "URI" with "JSON" in this discussion the same holds true: "{\"foo\":\"\n\"}"
holds a String that looks a lot like JSON, but isn't valid [ RFC 7159], and JSON.parse
correctly raises an exception on it.
If a network peer is sending you a message that includes bytes that look like a URI but with UTF-8-encoded Unicode characters and not ASCII-compatible percent-encoded octets (i.e. it's sending an IRI), then one of two things is happening:
-
the protocol you're using is built on IRIs, not URIs, and you are responsible for any transformations to/from URIs (including URI.encode
); or
-
the peer is in violation of a spec, and you should throw an error back at it. (In this case the specs are usually quiet clear on exactly what error to throw, too.)
Do you not agree that URI.parse should accept unicode entities in URIs? It wasn't clear from your response.
I think URI.parse
correctly raises an exception when it encounters characters that are forbidden by RFC3986.
I'm not aware of any IRI-compatible API in MRI that could allow me to directly parse URIs containing non-ASCII characters with Ruby, whether they match the strict definition of a URI or not.
Your thinking here seems confused. If a String contains non-ASCII characters then it's not a URI. If it is a URI then it strictly matches the definition of a URI. If a String contains a valid IRI, then yeah, you're not going to get much help from Ruby; but IRIs are not commonly used in the real world anyway.