Project

General

Profile

Bug #9304

Invalid XML entity references

Added by ploeh (Mark Seemann) over 5 years ago. Updated over 5 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:59327]

Description

=begin
It seems that the escapeHTML method occasionally generates the invalid XML entity reference (({&tt;})) when XML escaping.

See this Stack Overflow question for more details: http://stackoverflow.com/q/20563078/126014

Although I see this issue manifest itself in Jekyll, the Jekyll maintainers closed the issue I reported there because the Jekyll filter delegates to CGI.escapHTML.

Could it be a but in escapeHTML?

I'm seeing this on my own system, which is currently reporting:

$ ruby --version

ruby 2.0.0p247 (2013-06-27) [i386-mingw32]

but I'm also seeing this on GitHub pages, and I don't know the exact version of Ruby running in that environment.
=end

History

Updated by mame (Yusuke Endoh) over 5 years ago

Interesting, but I cannot reproduce:

$ ruby -v -rcgi -e 'puts CGI.escapeHTML("\n")'
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-linux]
</p>
<p>

Could you give us a reproducible project?

--
Yusuke Endoh mame@tsg.ne.jp

Updated by mame (Yusuke Endoh) over 5 years ago

Don't mind, I found the repo of your blog: https://github.com/ploeh/ploeh.github.com

I could reproduce the issue successfully ... only when I open rss.xml with Chrome!
With Firefox, Konqueror, wget, and curl download it without "&tt;".

So I now guess this is a bug of Chrome's XML viewer.
Are you also using Chrome? Can you reproduce the issue without Chrome?

--
Yusuke Endoh mame@tsg.ne.jp

Updated by duerst (Martin Dürst) over 5 years ago

mame (Yusuke Endoh) wrote:

Don't mind, I found the repo of your blog: https://github.com/ploeh/ploeh.github.com

I could reproduce the issue successfully ... only when I open rss.xml with Chrome!

Same here. Actually, just opening the blog in Chrome shows a "structured" XML view, where XML tags appear in purple color and foldable, and the escaped HTML markup appears correctly unescaped, and not a single "&tt;" can be found. It's only when I look at the file with "view source" that I see "&tt;" in a few places (11 in my case).

With Firefox, Konqueror, wget, and curl download it without "&tt;".

I tried with Firefox, Opera12, and Safari. No "&tt;" anywhere. I saved the file from Firefox, Chrome, and Opera12. The files from Firefox and Chrome are exactly the same bit-by-bit. The Opera12 file is the same except for line-ending conventions. Safari doesn't let me save the raw XML, only an HTML file that from the look of it was produced by a Safari-internal transform used to show RSS files.

So I now guess this is a bug of Chrome's XML viewer.

Actually, not the XML viewer, but the source viewer, in my case.

Are you also using Chrome? Can you reproduce the issue without Chrome?

The Stackoverflow entry says so: "Furthermore, when I run

jekyll serve -w

on my local machine, I still see the same type of error, but not in the same places."

Regards, Martin.

P.S.: Tongue in cheek: 1) I always disliked the way some blog formats escaped HTML to put it into XML. There would be less escape problems if it were unescaped XHTML fragments in XML. That's what XML was designed for. 2) The blog is impressively big (Firefox took what seemed like a full minute to do "View Source"). At least some of the blog formats have a facility to split up entries into several pages and connect them. Maybe it's time to look at a way to do that here. A less overworked browser might show less errors :-).

Updated by mame (Yusuke Endoh) over 5 years ago

  • Status changed from Open to Rejected

duerst (Martin Dürst) wrote:

So I now guess this is a bug of Chrome's XML viewer.

Actually, not the XML viewer, but the source viewer, in my case.

Ah, you are right.

Easy way to reproduce:

$ ruby -e '
puts %(<?xml version="1.0"?>)
puts %()
10000.times { puts %(<x></x>) }
puts %()
' > t.xml

$ google-chrome t.xml

and view page source. "&tt;" appears twice in my machine.

I'm closing this ticket because apparently this is a bug of Chrome, not Ruby nor Jekyll.
Please reopen if you could reproduce the issue on a browser but Chrome.

--
Yusuke Endoh mame@tsg.ne.jp

Also available in: Atom PDF