开发者

Get and process entire row in ddply in a function

It's easy to grab one or more in ddply to process, but is there a way to grab the entire current row and pass that onto a function? Or to grab a set of columns determined at runtime?

Let me illustrate:

Given a dataframe like

df = data.frame(a=seq(1,20), b=seq(1,5), c= seq(5,1))
df
    a b c
1   1 1 5
2   2 2 4
3   3 3 3

I could write a function to sum named columns along a row of a data frame like this:

selectiveSummer = function(row,colsToSum) {
   return(sum(row[,colsToSum])) 
}

It works when I call it for a row like this:

> selectiveSummer(df[1,],c('a','c'))
[1] 6

So I'd like to wrap that in an anonymous function and use it in ddply to apply it to every row in the table, something like the example below

f = function(x) { selectiveSummer(x,c('a','c')) }
#this doesn't work!
ddply(df,.(a,b,c), transform, foo=f(row))

I'd like to find a solution where the set of columns to manipulate can be determined at runtime, so if there's some way just to splat that from ddply's args and pass it into a function that takes any number of args, that works too.

Edit: To be clear, the real application driving this isn't sum, but this 开发者_如何学Cwas an easier explanation


You can only select single rows with ddply if rows can be identified in a unique way with one or more variables. If there are identical rows ddply will cycle over data frames of multiple rows even if you use all columns (like ddply(df, names(df), f).

Why not use apply instead? Apply does iterate over individual rows.

apply(df, 1, function(x) f(as.data.frame(t(x)))))

result:

[1]  6  6  6  6  6 11 11 11 11 11 16 16 16 16 16 21 21 21 21 21


Simple...

df$id = 1:nrow(df)
ddply(df,c('id'),function(x){ ... })

OR

adply(df,1,function(x){ ... })
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜