开发者

XML to hash table in Ruby: Parsing list of historical inventions

I'd like to slurp the following data about historical inventions into a convenient Ruby data structure:

http://yootles.com/outbox/inventions.xml

Note that all the data is in the XML attributes.

It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would handle the attributes properly. In any case, I need this as a standalone Ruby script. Nokogiri seems overly complicated for this simple task based on this code that someone pos开发者_开发问答ted for a similar problem: http://gist.github.com/335286. I found a purportedly simple solution using hpricot but it doesn't seem to handle the XML attributes. Maybe that's a simple extension? Finally there's ROXML but that looks even more heavyweight than nokogiri.

To make the question concrete (and with obvious ulterior motives), let's say that an answer should be a complete Ruby script that slurps the XML from the above URL and spits out CSV like this:

id, invention, year, inventor, country
RslCn, "aerosol can", 1926, "Erik Rotheim", "Norway"
RCndtnng, "air conditioning", 1902, "Willis Haviland Carrier", "US"
RbgTmtv, "airbag, automotive", 1952, "John Hetrick", "US"
RplnNgnpwrd, "airplane, engine-powered", 1903, "Wilbur and Orville Wright", "US"

I'll work on my own answer and post it too unless someone beats me to the punch with something clearly superior. Thanks!


Using REXML and open-uri:

require "rexml/document"
require "open-uri"

doc = REXML::Document.new open( "http://yootles.com/outbox/inventions.xml" ).read

puts [ 'id', 'invention', 'year', 'inventor', 'country' ].join ','
doc.root.elements.each do |invention|
  inventor = invention.elements.first
  data = []
  data << invention.attributes['id']
  data << '"' + invention.attributes['name'] + '"'
  data << invention.attributes['year']
  data << '"' + inventor.attributes['name'] + '"'
  data << '"' + inventor.attributes['country'] + '"'
  puts data.join ','
end


It turned out to be simpler than I thought with Nokogiri:

require 'rubygems'
require 'nokogiri' # needs sudo port install libxslt and stuff; see nokogiri.org
require 'open-uri'

@url = 'http://yootles.com/outbox/inventions.xml'

doc = Nokogiri::XML(open(@url))
puts("id, invention, year, inventor, country")
doc.xpath("//invention").each{ |i| 
  inventor = i.xpath("inventor").first
  print i['id'], ", \"", i['name'], "\", ", i['year'], ", \"", 
  inventor['name'], "\", \"", inventor['country'], "\"\n"
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜