Parsing a blogspot XML file with Nokogiri
I have a blogspot exported xml file and it looks something like this:
<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed&开发者_运维技巧gt;
How do I parse with Nokogiri and Xpath???
Here is what I have :
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(File.open("blogspot.xml"))
doc.xpath('//content[@type="html"]').each do |node|
puts node.text
end
but it's not giving me anything :/
any suggestions? :/
Your code works for me. There were some problems with certain version of Nokigiri.
I get:
Content
Content
I'm using nokogiri (1.4.1 x86-mswin32)
turns out that i had to delete the attributes for feed
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>
I just stumbled on this question. The issue appears to be XML namespaces:
"turns out that i had to delete the attributes for feed"
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>
XML Namespaces complicate accessing nodes because they provide a way to separate similar tags. Read the "Namespaces" section of Searching an HTML / XML Document
.
Nokogiri also has the remove_namespaces!
method which is a sometimes-useful way of dealing with the problem but has some downsides too.
精彩评论