开发者

Using Nokogiri with XML files in Ruby

I have this XML:

<Experiment>
<mzData version="1.05" accessionNumber="1635">
<description>
<admin>
<sampleName>Fas-induced and control Jurkat T-lymphocytes</sampleName> 
<sampleDescription>
<cvParam cvLabel="MeSH" accession="D017209" name="apoptosis" /> 
<cvParam cvLabel="UNITY" accession="D2135" name="Jurkat cells" /> 
<cvParam cvLabel="MeSH" accession="D019014" name="Antigens, CD95" /> 
</sampleDescription>
</admin>
</description>
</mzData>
</Experiment>
</ExperimentCollection>

I also have the following code:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::XML(File.open("my.xml"))

sampleName = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleName" ).text
sampleDescription = doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" ).text
puts sampleName + " " + sampleDescription

foo = sampleName + " " + sampleDescription 
f = File.new("my.txt","w")
f.write(foo) 
f.close()

The code grabs the sampleName just fine, but not the accession letters/numbers. I only want to grab all the letters/numbers after MeSH -> accession (D017209 and D019014). What do I have 开发者_运维问答to change in the doc.xpath command to make this work?


doc.xpath( "/ExperimentCollection/Experiment/mzData/description/admin/sampleDescription/MeSH/@accession" )

Returns nothing because there is no tag MeSH. You need to replace MeSH with cvParam[@cvLabel=\"MeSH\"] (read: a cvParam tag which has an attribute cvLabel with the value MeSH).

Once you fixed that xpath will return a collection of Nokogiri::XML::Attr objects. By calling text on that collection you will get back the string value of the first element. Since you want all of the elements you should instead use map(&:text) (or map {|n| n.text} in ruby 1.8.6) which will return an array containing the string value of each accession attribute (i.e. ["D017209", "D019014"] for the example XML-file).

Since you seem to be confused, here's a clarification:

@Bobby: When I said "xpath will return a collection of Nokogiri::XML::Attr objects", I meant just that. You call xpath and then xpath creates and returns a collection of Attr objects. In no way did I mean that you should manually create any Attr objects yourself.

And when I said you should use map, I just meant you should call map on the collection returned by xpath (though instead of using map you can just call puts with the collection as an argument).

  1. So what you need to do is 1. fix your xpath like I described.
  2. use xpath with the fixed xpath to get a collection
  3. use puts to print it

In other words:

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::XML(File.open("my.xml"))

common_prefix = "/ExperimentCollection/Experiment/mzData/description/admin"
sample_name = doc.xpath( common_prefix+"/sampleName" ).text
accessions = doc.xpath( common_prefix+
               "/sampleDescription/cvParam[@cvLabel=\"MeSH\"]/@accession" )

puts sample_name
puts accessions


Here is a simple way to do it, although this is probably too clever, because you'll probably want to do other things as well:

File.open("my.txt","w") do |f|
  doc.xpath('//cvParam[@cvLabel="MeSH"]').each {|n| f << "#{n['name']} #{n['accession']}\n"}
end

You may need a more selective xpath statement.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜