开发者

doing a plyr operation on every row of a data frame in R

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame?

Here's an example that works well for a simple case:

x <- rnorm(10)
y <- rnorm(10)
df <- data.frame(x,y)
ddply(df,names(df) ,function(df) max(df$x,df$y))

that works fine and gives me what I want. But if things get more complex this causes plyr to get funky (and not like Bootsy Collins) because plyr is chewing on making "levels" out of all those floating point values

x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000)
myLetters <- sample(letters, 1000, replace=T)
df <- data.frame(x,y, z, myLetters)
ddply(df,names(df) ,function(df) max(df$x,df$y))

on my box this chews for a few minutes and then returns:

Error: memory exhausted (limit reached?)
In addition: Warning messages:
1: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reac开发者_StackOverflowhed total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)

I think I am totally abusing plyr and I am not saying this is a bug in plyr, but rather abusive behavior by me (liver and dog notwithstanding).

So in short, is there syntax shortcut for using ddply to operate on every row as a substitute for apply(X, 1, ...)?

The workaround I've been using is to create a "key" that gives a unique value for every row and then I can join back to it.

 x <- rnorm(1000)
 y <- rnorm(1000)
 z <- rnorm(1000)
 myLetters <- sample(letters, 1000, replace=T)
 df <- data.frame(x,y, z, myLetters)
  #make the key
 df$myKey <- 1:nrow(df)
 myOut <- merge(df, ddply(df,"myKey" ,function(df) max(df$x,df$y)))
  #knock out the key
 myOut$myKey <- NULL

But I keep thinking that "There Has to Be a Better Way"

Thanks!


Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜