开发者

Nokogiri Truncating XML Input

I am having issues with a colleagues machine truncating XML while using Nokogiri to parse a Media RSS feed. The feed is a standard Media RSS feed, and the XML is not malformed. It looks like it simply stops at a certain point in the XML and closes any tags that would have been open at that current point in the document. (Unfortunately I do not have the XML avialable to me right now, but I will update this question with the actual XML when I have it available to me).

My confusion comes from it working fine on my machine (OSX 10.6, Nokogiri 1.4.4) while it in correctly on his machine using the same setup - however his machine is a few years older. I imagine that there is a difference somewhere but unfortunately I don't know what to look for.

Any thoughts or direction would be greatly appreciated.

XML Sample

This is the item where Nokogiri truncates the XML feed.

Note: I did replace some values in the feed to be non identifying.

<item>
<title>Naruto Season 7 - Episode 167 - When Egrets Flap Their Wings</title>
<link>http://www.test.com/redirect?url=%2Fnaruto-original%2Fepisode-167-when-egrets-flap-their-wings-526666&aff=0000000</link>
<guid isPermalink="true">http://www.test.com/media-526666</guid>
<description><img src="http://img1.lln.test.com/i/spire3-tmb/9730631d41af0f46cb556642ca1f32231240438469_thumb.jpg"  /><br />At Moso's mansion, a battle takes place b开发者_运维知识库etween the Wandering Ninja and a Leaf Ninja. With Chishima's help, Naruto is freed from Moso's genjutsu. Moso then reveals his true form as the leader of the Wandering ninja, Hoki!</description>
<enclosure url="http://img1.lln.test.com/i/spire3-tmb/9730631d41af0f46cb556642ca1f32231240438469_thumb.jpg" type="image/jpeg" length="6592"/>
<category>Anime</category>
<media:category scheme="http://gdata.youtube.com/schemas/2007/categories.cat" label="Anime">Movies_Anime_animation</media:category>
<pubDate>Wed, 22 Apr 2009 21:39:34 GMT</pubDate>
<test:freePubDate>Tue, 19 Jan 2038 00:27:28 GMT</test:freePubDate>
<test:premiumPubDate>Wed, 22 Apr 2009 21:39:34 GMT</test:premiumPubDate>
<test:episodeNumber>167</test:episodeNumber>
<test:duration>1414</test:duration>
<test:publisher>TV TOKYO</test:publisher>
<media:content url="https://www.test.com/syndication/video?id=1444659&affiliate_code=0000000" type="video/mp4" medium="video" duration="1414"/>
<media:restriction relationship="allow" type="country">us ca as um pr gu vi</media:restriction>
<media:credit role="distribution company">Test Inc.</media:credit>
<media:rating scheme="urn:simple">nonadult</media:rating>
<media:thumbnail url="http://img1.lln.test.com/i/spire3-tmb/9730631d41af0f46cb556642ca1f32231240438469_full.jpg"/>
<media:keywords>action, adventure, comedy, supernatural, martial, arts, ninja, shounen, super, power, drama, fantasy</media:keywords>
</item>


My guess based on the machine difference: Nokogiri is relies on libxml2 for most of its work and speed. I'm guessing the failing machine has an older, buggy version of libxml2 that Nokogiri was built against. Try removing Nokogiri, upgrading the libxml2, and then re-installing Nokogiri (so that it builds against the newer libxml2).

See either What to do if libxml2 is being a jerk or (if, like me, you prefer to build from source instead of using fink or macports) Use libxml from source.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜