Bug #19196: The string saved to Tempfile from URI.open escapes "&" character - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #19196

closed

The string saved to Tempfile from URI.open escapes "&" character

Added by westoque (William Estoque) over 2 years ago. Updated over 2 years ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

Backport:

2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN

[ruby-core:111263]

Description

When I am reading the string response from a URI.open, the response is not equivalent to the response body.

How to reproduce:

url = "https://www.podcastone.com/podcast?categoryID2=1237"

handle = URI.open(url)
=> #<Tempfile:/path/to/tempfile>

puts handle.read
.... https://dts.podtrac.com/redirect.mp3/pdst.fm/e/chrt.fm/track/E2G895/aw.noxsolutions.com/launchpod/adswizz/1237/762-FeedbackFriday-249-V2_mzwq_b1dc1677.mp3?awCollectionId=1237&#38;awEpisodeId=ee01b21a-878d-4be4-974c-e504b1dc1677&#38;adwNewID3=true&#38;awNetwork=309...

In the browser, the actual string reads:

https://dts.podtrac.com/redirect.mp3/pdst.fm/e/chrt.fm/track/E2G895/aw.noxsolutions.com/launchpod/adswizz/1237/762-FeedbackFriday-249-V2_mzwq_b1dc1677.mp3?awCollectionId=1237&#38;awEpisodeId=ee01b21a-878d-4be4-974c-e504b1dc1677&#38;adwNewID3=true&#38;awNetwork=309

Notice the characters #38;

My initial research is that it's because the Tempfile that gets created is in ascii-8bit, and in ascii-8bit, the amperstand is a "38".

I propose that we should have a way to force the encoding of the Tempfile to UTF8 so that this character is not escaped and the string encoding is preserved.

Actions

Copy link

Updated by westoque (William Estoque) over 2 years ago

Subject changed from The string saved to Tempfile from URI.open escapes "&" characters to The string saved to Tempfile from URI.open escapes "&" character

Actions

Copy link

Updated by westoque (William Estoque) over 2 years ago

Description updated (diff)

Actions

Copy link

Updated by westoque (William Estoque) over 2 years ago

Description updated (diff)

Actions

Copy link

#4 [ruby-core:111265]

Updated by ufuk (Ufuk Kayserilioglu) over 2 years ago

The content you are reading is XML and & characters are there because of XML-escaping. They are not related to any kind of file encoding, ASCII-8BIT or UTF-8.

Moreover, they are there in the response from the server, which you can see by looking at the output of curl for the same resource:

$ curl -s "https://www.podcastone.com/podcast?categoryID2=1237" | grep "aw.noxsolutions.com/launchpod/adswizz/1237/762-"
...
<enclosure length="74614442" type="audio/mpeg" url="https://dts.podtrac.com/redirect.mp3/pdst.fm/e/chrt.fm/track/E2G895/aw.noxsolutions.com/launchpod/adswizz/1237/762-FeedbackFriday-249-V2_mzwq_b1dc1677.mp3?awCollectionId=1237&#38;awEpisodeId=ee01b21a-878d-4be4-974c-e504b1dc1677&#38;adwNewID3=true&#38;awNetwork=309"></enclosure>
...

So, this is not a Ruby problem at all. On the contrary, Ruby can help you unescape these characters:

require "cgi"
CGI.unescapeHTML("foo&#38;bar") # => "foo&bar"

Actions

Copy link

Updated by Eregon (Benoit Daloze) over 2 years ago

Status changed from Open to Rejected

Actions

Copy link

#6 [ruby-core:111276]

Updated by westoque (William Estoque) over 2 years ago

@ufuk (Ufuk Kayserilioglu) thank you for that explanation. I may have jumped to conclusions when checking that response in the browser (Chrome) vs curl which unescaped the characters.

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0Like0Like1Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #19196

The string saved to Tempfile from URI.open escapes "&" character

Updated by westoque (William Estoque) over 2 years ago

Updated by westoque (William Estoque) over 2 years ago

Updated by westoque (William Estoque) over 2 years ago

Updated by ufuk (Ufuk Kayserilioglu) over 2 years ago

Updated by Eregon (Benoit Daloze) over 2 years ago

Updated by westoque (William Estoque) over 2 years ago