RSS escaped HTML
My understanding of RSS's "escaped HTML" is that something like this:
HTML:
1 < 3
becomes (RSS):
1 &lt; 3
So, then, should this:
<img src="http://somehost/开发者_StackOverflow社区someimage?a=foo&b=bar" />
Become:
<img src="http://somehost/someimage?a=foo&amp;b=bar" />
(Note the &amp;
If yes, is this then invalid RSS?
<description>
...
<img src="http://d.yimg.com/a/p/ap/20110309/capt.f6...02-0.jpg?x=91&y=130&q=85&sig=6oI7fIgN0izc9olfgY56vw--" />
</description>
(Additionally, is the fact that the closing > isn't escaped bad?)
The problem with the above <description> that I'm having is that once you decode the first layer of entities (XML) to arrive at the contents of the <description> tag, you get one long CDATA, which should be HTML. The problem is that the <img> has just a &
, which is an invalid entity. For the massive chunk above, I get something like <img src="....?x=1&y=2" />
, which isn't valid HTML.
Am I just looking at crappy HTML that got shoved into RSS, or am I missing something here?
you need to use CDATA Sections
<description><![CDATA[ <img src="http://somehost/someimage?a=foo&b=bar" /> ]]>
</description>
精彩评论