开发者

Parsing a blogspot XML file with Nokogiri

I have a blogspot exported xml file and it looks something like this:

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed&开发者_运维技巧gt;

How do I parse with Nokogiri and Xpath???

Here is what I have :

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

but it's not giving me anything :/

any suggestions? :/


Your code works for me. There were some problems with certain version of Nokigiri.

I get:

 Content
 Content

I'm using nokogiri (1.4.1 x86-mswin32)


turns out that i had to delete the attributes for feed

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>


I just stumbled on this question. The issue appears to be XML namespaces:

"turns out that i had to delete the attributes for feed"

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML Namespaces complicate accessing nodes because they provide a way to separate similar tags. Read the "Namespaces" section of Searching an HTML / XML Document.

Nokogiri also has the remove_namespaces! method which is a sometimes-useful way of dealing with the problem but has some downsides too.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜