Alternatives to system() in R for calling sed, rsync, ssh etc.: Do functions exist, should I write my own, or am I missing the point?
Recently, I found the base::files
commands. Along with other commands like getwd
, write.lines
, file.show
, dir
, etc. there seem to be a number of R equivalents of bash functions.
I have also written some functions in R that streamline calls to ssh
and rsync
through system
.
for example:
rsync <- functio开发者_JS百科n(from, to){
system(paste('rsync -outi', from, to, sep = ' '), intern=TRUE)
}
But before I go to much further with this, I have a few questions:
- does R already have built in commands for common shell programs, if so, where can I find them?
- if not, are there reasons to avoid writing my own functions?
- is there a better alternative to the approach outlined in the
rsync
example above? - would a collection of such functions warrant a package?
does R already have built in commands for common shell programs, if so, where can I find them?
There are some function like grep
that mimic shell progams. Search for them as you would any other function – the names are often the same.
if not, are there reasons to avoid writing my own functions?
No obvious problems.
is there a better alternative to the approach outlined in the rsync example above?
Looks good, but you need to be very careful about checking user input if things are passed to the shell.
would a collection of such functions warrant a package?
Absolutely. Go for it.
I started to go down that route with wrapping git functions for devtools, but eventually realised what I needed was:
bash <- function() system("bash")
with a bit of wrapping to make sure I ended up in the right directory.
There's not much out there, apparently ...
> library(sos)
> findFn("rsync")
found 0 matches
x has zero rows; nothing to display.
Warning message:
In findFn("rsync") : HIT not found in HTML; processing one page only.
> findFn("ssh")
found 27 matches; retrieving 2 pages
2
The ssh
hits are either false positives or part of parallel-processing packages (GridR
, nws
, biopara
). RCurl
has an scp
command (based on libcurl
, not a system call).
UPDATED
thanks to @hadley for pointing this out - the time penalty was due to using the intern = TRUE
argument, see update below.
Rather than deleting the answer, I am going to leave the answer up here for reference, unless it gets lots of downvotes
After creating a few such commands, I have realized one disadvantage (potentially significant):
Wrapping a system call in a function increases the speed at which the function is called, almost 8-fold in this example:
Using system
:
system.time(system(paste('rsync -outi', '~/dir/files* ', 'serverhost:')))
user system elapsed
0.060 0.020 0.552
Wrapping system
in a new function, rsync
:
rsync <- function (from, to, pattern = "") {
system(paste("rsync -outi", from, to, sep = " "), intern = TRUE)
}
system.time(rsync(from = '~/dir/files*', to = 'serverhost:'))
user system elapsed
0.040 0.030 3.825
Update
The speed penalty resulted from the unnecessary use of intern = TRUE
rsync <- function (from, to, pattern = "") {
system(paste("rsync -outi", from, to, sep = " "))
}
system.time(rsync(from = '~/dir/files*', to = 'serverhost:'))
user system elapsed
0.070 0.020 0.504
精彩评论