Creating large XML Trees in R
I'm trying to create a large XML tree in R. Here's a simplified version of the code:
library(XML)
N = 100000#In practice is larger 10^8/ 10^9
seq = newXMLNode("sequence")
pars = as.character(1:N)
for(i in 1:N)
newX开发者_开发技巧MLNode("Parameter", parent=seq, attrs=c(id=pars[i]))
When N is about N^6 this takes about a minute, N^7 takes about forty minutes. Is there anyway to speed this up?
Using the paste command:
par_tmp = paste('<Parameter id="', pars, '"/>', sep="")
takes less than a second.
I would recommend profiling the function using Rprof
or the profr
package. This will show you where your bottleneck is, and you then you can think about ways to either optimize the function or change the way that you're using it.
Your paste
example would be much faster in part because it's vectorized. For a more fair comparison, you can see the difference there by looping over paste
as you are currently doing with newXMLNode
and see the difference in timing.
Edit:
Here is the output from profiling your loop with profr
.
library(profr)
xml.prof <- profr(for(i in 1:N)
newXMLNode("Parameter", parent=seq, attrs=c(id=pars[i])))
plot(xml.prof)
There is nothing especially obvious in here about places that you can improve this. I see that it spends a reasonable amount of time in the %in%
function, so improving that would reduce the overall time somewhat (although you still need to iterate over this repeatedly, so it won't make a huge difference). The best solution would be to rewrite newXMLNode
as a vectorized function so you can skip the for
loop entirely.
精彩评论