Parsing a blogspot XML file with Nokogiri

2023-01-08 04:40 问答作者：

I have a blogspot exported xml file and it looks something like this:

<feed>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
<entry>
<title> title </title>
<content type="html"> Content </content>
</entry>
</feed&开发者_运维技巧gt;

How do I parse with Nokogiri and Xpath???

Here is what I have :

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'


 doc = Nokogiri::XML(File.open("blogspot.xml"))

 doc.xpath('//content[@type="html"]').each do |node|
  puts node.text
 end

but it's not giving me anything :/

any suggestions? :/

Your code works for me. There were some problems with certain version of Nokigiri.

I get:

 Content
 Content

I'm using nokogiri (1.4.1 x86-mswin32)

turns out that i had to delete the attributes for feed

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

I just stumbled on this question. The issue appears to be XML namespaces:

"turns out that i had to delete the attributes for feed"

<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'>

XML Namespaces complicate accessing nodes because they provide a way to separate similar tags. Read the "Namespaces" section of Searching an HTML / XML Document.

Nokogiri also has the remove_namespaces! method which is a sometimes-useful way of dealing with the problem but has some downsides too.

继续阅读：blogspot nokogiri xml

Parsing a blogspot XML file with Nokogiri

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？