parsing xml to list in R: how to consistently access nodes when xml structure varies?
Background
I have a xml settings file that can look like this:
<level1>
<level2>
<level3>
<level4name>bob</level4name>
</level3>
</level2>
</level1>
but there can be multiple instances of level3
<level1>
<level2>
<level3>
<level4name>bob</level4name>
</level3>
<level3>
<level4name>jack</level4name>
</level3>
<level3>
<level4name>jill</level4name>
</level3>
</level2>
</level1>
there can also be multiple types of level4
nodes for each level3
:
<level3>
<level4name>bob</level4name>
<level4dir>/home/bob/ </level4dir>
<level4logical>TRUE</level4logical>
</level3>
In R, I load this file using
settings.xml <- xmlTreeParse(settings.file)
settings <- xmlTo开发者_StackOverflow社区List(settings.xml)
I want to write a script that converts all of the values contained in level4type1
to a vector of the unique values at this level, but I am stumped trying to do this in a way that works for all of the above cases.
One of the problems is that the class(settings[['level2']])
is a list for the first two cases and a matrix for the third case.
> xmlToList(xmlTreeParse('case1.xml'))
$level2.level3.level4name
[1] "bob"
> xmlToList(xmlTreeParse('case2.xml'))
level2
level3.level4name "bob"
level3.level4name "jack"
level3.level4name "jill"
> xmlToList(xmlTreeParse('case3.xml'))
level2
level3 List,3
level3 List,1
level3 List,1
Questions
I have two questions:
how can I extract a vector of the unique values of 'level4type1`
is there a better way to do this?
Try using the internal node representation of XML and the xpath language, which is very powerful.
> xml = xmlTreeParse("case2.xml", useInternalNodes=TRUE)
> xpathApply(xml, "//level4name", xmlValue)
[[1]]
[1] "bob"
[[2]]
[1] "jack"
[[3]]
[1] "jill"
精彩评论