开发者

Difference between Rscript and littler

...besides the fact that Rscript is invoked with #!/usr/bin/env Rscript and littler with #!/usr/local/bin/r (on my system) in first line of script file. I've found certain differences in execution speed (seems like littler is a bit slower).

I've created two dummy scripts, ran each 1000 times and compared average execution time.

Here's the Rscript file:

#!/usr/bin/env Rscript

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "rscript.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

and here's the littler file:

#!/usr/local/bin/r

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "little.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

As you can see, they are alm开发者_开发问答ost identical (first line and sink file argument differ). Output is sinked to text file, hence imported in R with read.table. I've created bash script to execute each script 1000 times, then calculated averages.

Here's bash script:

for i in `seq 1000`
do
./$1
echo "####################"
echo "Iteration #$i"
echo "####################"
done

And the results are:

# littler script
> mean(lit)
    user   system  elapsed 
0.489327 0.035458 0.588647 
> sapply(lit, median)
   L1    L2    L3 
0.490 0.036 0.609 
# Rscript
> mean(rsc)
    user   system  elapsed 
0.219334 0.008042 0.274017 
> sapply(rsc, median)
   R1    R2    R3 
0.220 0.007 0.258 

Long story short: beside (obvious) execution-time difference, is there some other difference? More important question is: why should/shouldn't you prefer littler over Rscript (or vice versa)?


Couple quick comments:

  1. The path /usr/local/bin/r is arbitrary, you can use /usr/bin/env r as well as we do in some examples. As I recall, it limits what other arguments you can give to r as it takes only one when invoked via env

  2. I don't understand your benchmark, and why you'd do it that way. We do have timing comparisons in the sources, see tests/timing.sh and tests/timing2.sh. Maybe you want to split the test between startup and graph creation or whatever you are after.

  3. Whenever we ran those tests, littler won. (It still won when I re-ran those right now.) Which made sense to us because if you look at the sources to Rscript.exe, it works different by setting up the environment and a command string before eventually calling execv(cmd, av). littler can start a little quicker.

  4. The main price is portability. The way littler is built, it won't make it to Windows. Or at least not easily. OTOH we have RInside ported so if someone really wanted to...

  5. Littler came first in September 2006 versus Rscript which came with R 2.5.0 in April 2007.

  6. Rscript is now everywhere where R is. That is a big advantage.

  7. Command-line options are a little more sensible for littler in my view.

  8. Both work with CRAN packages getopt and optparse for option parsing.

So it's a personal preference. I co-wrote littler, learned a lot doing that (eg for RInside) and still find it useful -- so I use it dozens of times each day. It drives CRANberries. It drives cran2deb. Your mileage may, as hey say, vary.

Disclaimer: littler is one of my projects.

Postscriptum: I would have written the test as

I would have written this as

  fun <- function { X <- rnorm(100); print(x); print(plot(x)) }
  replicate(N, system.time( fun )["elapsed"])

or even

  mean( replicate(N, system.time(fun)["elapsed"]), trim=0.05)

to get rid of the outliers. Moreover, you only essentially measure I/O (a print, and a plot) which both will get from the R library so I would expect little difference.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜