XML to hash table in Ruby: Parsing list of historical inventions
I'd like to slurp the following data about historical inventions into a convenient Ruby data structure:
http://yootles.com/outbox/inventions.xml
Note that all the data is in the XML attributes.
It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would handle the attributes properly. In any case, I need this as a standalone Ruby script. Nokogiri seems overly complicated for this simple task based on this code that someone pos开发者_开发问答ted for a similar problem: http://gist.github.com/335286. I found a purportedly simple solution using hpricot but it doesn't seem to handle the XML attributes. Maybe that's a simple extension? Finally there's ROXML but that looks even more heavyweight than nokogiri.
To make the question concrete (and with obvious ulterior motives), let's say that an answer should be a complete Ruby script that slurps the XML from the above URL and spits out CSV like this:
id, invention, year, inventor, country
RslCn, "aerosol can", 1926, "Erik Rotheim", "Norway"
RCndtnng, "air conditioning", 1902, "Willis Haviland Carrier", "US"
RbgTmtv, "airbag, automotive", 1952, "John Hetrick", "US"
RplnNgnpwrd, "airplane, engine-powered", 1903, "Wilbur and Orville Wright", "US"
I'll work on my own answer and post it too unless someone beats me to the punch with something clearly superior. Thanks!
Using REXML and open-uri:
require "rexml/document"
require "open-uri"
doc = REXML::Document.new open( "http://yootles.com/outbox/inventions.xml" ).read
puts [ 'id', 'invention', 'year', 'inventor', 'country' ].join ','
doc.root.elements.each do |invention|
inventor = invention.elements.first
data = []
data << invention.attributes['id']
data << '"' + invention.attributes['name'] + '"'
data << invention.attributes['year']
data << '"' + inventor.attributes['name'] + '"'
data << '"' + inventor.attributes['country'] + '"'
puts data.join ','
end
It turned out to be simpler than I thought with Nokogiri:
require 'rubygems'
require 'nokogiri' # needs sudo port install libxslt and stuff; see nokogiri.org
require 'open-uri'
@url = 'http://yootles.com/outbox/inventions.xml'
doc = Nokogiri::XML(open(@url))
puts("id, invention, year, inventor, country")
doc.xpath("//invention").each{ |i|
inventor = i.xpath("inventor").first
print i['id'], ", \"", i['name'], "\", ", i['year'], ", \"",
inventor['name'], "\", \"", inventor['country'], "\"\n"
}
精彩评论