开发者

Can (should) I inherit parts of a function in R?

I have two functions that start pretty similarly. Hence I wonder if this is the right moment to dive into inheritance in R.

firstfunc <- function(table,pattern="^Variable") {

dframe <- get(table)
cn <- colnames(get(table))
qs <- subset开发者_如何学Go(cn, cn  %in% grep(pattern, cn, value=TRUE))

    .....

}

secondfunc <- function(table,pattern="^stat"){

dframe <- get(table)
cn <- colnames(get(table))
qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))

    ....

}

There will be more than two functions and two patterns. My tables contains a lot of variables, which can be easily grouped by their names, which is why I use these pattern identification. It works well so far and c&p these few lines is not that much of an effort. However, is it reasonable to write these lines into one function / method and let the others inherit?

Most help I read on OO in R so far used examples that assigned attributes to data and then used generic functions. Unfortunately I did not understand yet if this can help my case too.

Thx for any suggestions, pointers to a good head first start into this!


There is no inheritance of function parts in R. You cannot "inherit part's" of functions from other functions, only call functions from other functions. All OO paradigms in R (S3,S4,refClasses) are exactly what they say, object-oriented. Methods are dispatched according to the class of objects they receive.

Your question is really how to get rid of code repetition.

There are two ways, one standard and one not so standard.

  • Standard way: Write functions for repeated code and call them from other functions. The drawback is that functions return only one object, but you have three. So you can do something like this:

    repeated_code <- function(table, pattern){
        objects <- list()
        objects$dframe <- get(table)           
        objects$cn <- colnames(get(table))
        objects$qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))
        }
    
    
    firstfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          ...
          manipulate objects
          ...
          }
    
    
    secondfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          ...
          manipulate objects
          ...
          }     
    
  • Not so standard way: Use unevaluated expressions:

     redundant_code <- expression({
          dframe <- get(table)  
          cn <- colnames(get(table))
          qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))
     })
    
    
     firstfunc <- function(table,pattern="^Variable") {
         eval(redundant_code, envir=parent.frame())
         ...
     }
    
    
     secondfunc <- function(table,pattern="^Variable") {
         eval(redundant_code, envir=parent.frame())
         ...
     }
    

[Update: Since the R 2.12.0 there is yet another, multi-assign way. Write a function wich returns the list of objects (like in the "standard" case above). Then assign the objects in the returned list to the current evnvironmnet with list2env:

    secondfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          list2env(objects, envir = parent.frame())
          ...
          }     

]


Can you? Yes. S4 has features to handle this scenario. See the wiki page for some resources. Hadley also recently wrote a nice introduction (see the section on "Generic functions and methods").

You can see this with setMethod in any existing S4 code (see timeSeries for an example). Note the different signatures for the same function.

Should you? Yes you should, but you will be adding some complexity to the code. S4 doesn't come for free; it requires a lot more infrastructure. So there's a trade off, and you will need to decide whether it's worth.


[Edit: Ah, I didnt notice you only posted the start of your functions, and the bodies are probably different]

The other thing you might want to look into, given all this 'get' stuff and use of column names, is the formula mechanism as used by lm() and friends. You can specify columns by name in a formula, something like:

foofunc(~Variable, data=mytable)

and use the model functions to get the values. Things like model.matrix and so on. I'm guessing from the 'gets' that you are passing names of objects around, which is a bad thing to do generally. Pass the object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜