Selecting specific XML nodes in R?
I am using XML
package in R
to parse a XML
file that has the following structure.
<document id="Something" 开发者_开发知识库origId="Text">
<sentence id="Something" origId="thisorig" text="Blah Blah.">
<special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
</sentence>
<sentence id="Something" origId="thisorig" text="Blah Blah.">
</sentence>
</document>
I want to select the nodes having </special>
tag in them in one variable and the nodes without the </special>
tag in other variable.
Is it possible to do it with R
any pointers/answers will be very helpful.
I added a few more cases to test for exceptions:
<document id="Something" origId="Text">
<sentence id="Something" origId="thisorig" text="Blah Blah.">
<special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
</sentence>
<sentence id="Else" origId="thatorig" text="Blu Blu.">
<special id="id.s0.i1" origId="1" e1="en1" e2="en2" type="" directed="True"/>
</sentence>
<sentence id="Something" origId="thisorig" text="Blah Blah.">
<notso id = "hallo" />
</sentence>
<sentence id="Something no sentence" origId="thisOther" text="Blah Blah.">
</sentence>
</document>
library(XML)
doc = xmlInternalTreeParse("sentence.xml")
hasSentence = xpathApply(doc, "//sentence/special/..")
xpathApply(doc, "/document/sentence[not(child::special)]")
Parse the xml tree, use xpath to specify the location of the nodes.
doc <- xmlTreeParse("test.xml", useInternalNodes = TRUE)
special_nodes <- getNodeSet(doc, "/document//special")
精彩评论