Problem getting XML node attribute value with nokogiri
I'm trying to parse an XML file from iTunes using nokogiri and rails 3.
Here is my code:
itunes_top_300 = Nokogiri.HTML(open("http://itunes.apple.com/us/rss/toppodcasts/limit=300/xml"))
itunes_top_300.search('//entry').each do |podcast|
url = podcast.xpath("//[@href]]").text
return podcast.url
end
When I load up the view that calls this method, I get:
undefined method `url'
Here is th开发者_如何学Pythone xml I'm trying to parse:
http://itunes.apple.com/us/rss/toppodcasts/limit=300/xml
Thanks in advance,
Harris
Although you have stated that your code is working again, let me point out some flaws in your code:
You are asking Nokogiri to parse the XML RSS feed as HTML. You should instead use
Nokogiri::XML( ... )
; not a big deal, and not the cause of this problem.You are using a
return
inside your each. In the code you have shown, that would normally cause aLocalJumpError: unexpected return
. Clearly you are using this code inside a method (that you have not shown us). Usingreturn
inside a block does not exit the block, but rather causes the enclosing method to return. As for what you probably want instead, read on:You are creating a local
url
variable, but you are not using it.I am guessing that what you were trying to do is find just the
url
from each feed. However, by using the XPath//[@href]
what you were really doing is finding every element in the document that has anhref="..."
attribute. You are re-finding this full set of elements for eachentry
in the document. (Except, due to thereturn
statement, you were exiting early.) And then, by asking for thetext
of the element, you would have been getting nothing.As for the actual error you were getting, you were attempting to access
podcast.url
, but Nokogiri elements do not have aurl
method.
Given the schema of the feeds from the URL you have supplied, here are different ways to get an array of the href="..."
attribute of every entry/link
in the document, in increasing order of simplicity and preference:
Near-Direct Translation
urls = []
itunes_top_300.search('//entry').each do |podcast|
# Find the first element below the current one that has an href attribute
# and then get the value of that attribute
url = podcast.at_xpath(".//[@href]")['href']
# Add this url to the array
urls << url
end
# As the last statement in your method, return urls (without word 'return')
urls
Getting rid of the local variable
urls = []
itunes_top_300.search('//entry').each do |podcast|
# It's pretty clear what we're doing, so no need to name the value
# before we add it to the array
urls << podcast.at_xpath(".//[@href]")['href']
end
urls
Cleaning it up with Map
# Run through the array and convert each element to the return value
# of the block
itunes_top_300.search('//entry').map do |podcast|
podcast.at_xpath(".//[@href]")['href']
end
# If the above is the last statement of the method, the method will return the
# result of the map as the return value of the method
Asking for just the attribute directly
itunes_top_300.search('//entry').map do |podcast|
# Instead of getting the element, get the attribute itself
# Use `to_s` or `value` to get the text of the attribute node.
podcast.at_xpath(".//[@href]/@href").to_s
end
Using only XPath to get what we wanted in the first place
# Take an array of attribute nodes and get their values
itunes_top_300.xpath('//entry/link/@href').map{ |attr| attr.to_s }
Using Ruby 1.9 syntax to shorten the map call
# Map the result of the XPath by calling `to_s` on each
itunes_top_300.xpath('//entry/link/@href').map( &:to_s )
精彩评论