Using Nokogiri with XML files in Ruby
I have this XML:
<Experiment>
<mzData version="1.05" accessionNumber="1635">
<description>
<admin>
<sampleName>Fas-induced and control Jurkat T-lymphocytes</sampleName>
<sampleDescription>
<cvParam cvLabel="MeSH" accession="D017209" name="apoptosis" />
<cvParam cvLabel="UNITY" accession="D2135" name="Jurkat cells" />
<cvParam cvLabel="MeSH" accession="D019014" name="Antigens, CD95" />
</sampleDescription>
</admin>
</description>
</mzData>
</Experiment>
</ExperimentCollection>
I also have the following code:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(File.open("my.xml"))
sampleName = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text
sampleDescription = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" ).text
puts sampleName + " " + sampleDescription
foo = sampleName + " " + sampleDescription
f = File.new("my.txt","w")
f.write(foo)
f.close()
The code grabs the sampleName
just fine, but not the accession
letters/numbers. I only want to grab all the letters/numbers after MeSH
-> accession
(D017209
and D019014
). What do I have 开发者_运维问答to change in the doc.xpath
command to make this work?
doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" )
Returns nothing because there is no tag MeSH
. You need to replace MeSH
with cvParam[@cvLabel=\"MeSH\"]
(read: a cvParam
tag which has an attribute cvLabel
with the value MeSH
).
Once you fixed that xpath
will return a collection of Nokogiri::XML::Attr
objects. By calling text on that collection you will get back the string value of the first element. Since you want all of the elements you should instead use map(&:text)
(or map {|n| n.text}
in ruby 1.8.6) which will return an array containing the string value of each accession
attribute (i.e. ["D017209", "D019014"]
for the example XML-file).
Since you seem to be confused, here's a clarification:
@Bobby: When I said "xpath
will return a collection of Nokogiri::XML::Attr
objects", I meant just that. You call xpath
and then xpath
creates and returns a collection of Attr
objects. In no way did I mean that you should manually create any Attr
objects yourself.
And when I said you should use map
, I just meant you should call map
on the collection returned by xpath
(though instead of using map
you can just call puts
with the collection as an argument).
- So what you need to do is 1. fix your xpath like I described.
- use
xpath
with the fixed xpath to get a collection - use puts to print it
In other words:
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::XML(File.open("my.xml"))
common_prefix = "/ExperimentCollection/Experiment/mzData/description/admin"
sample_name = doc.xpath( common_prefix+"/sampleName" ).text
accessions = doc.xpath( common_prefix+
"/sampleDescription/cvParam[@cvLabel=\"MeSH\"]/@accession" )
puts sample_name
puts accessions
Here is a simple way to do it, although this is probably too clever, because you'll probably want to do other things as well:
File.open("my.txt","w") do |f|
doc.xpath('//cvParam[@cvLabel="MeSH"]').each {|n| f << "#{n['name']} #{n['accession']}\n"}
end
You may need a more selective xpath statement.
精彩评论