How do I sub sample data by group using ddply?
I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.
开发者_Python百科I wrote a function to subset a data frame as follows:
samp <- function(dataf)
{
dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}
Now I want to apply this function to each species in a larger data frame.
When I try something like
culled_data = ddply (larger_data, .(species), subset, samp)
I get this error:
Error in subset.data.frame(piece, ...) :
'subset' must evaluate to logical
Anyone got ideas on how to do this?
It looks like it should work once you remove , subset
from your call.
Dirk answer is of course correct, but to add additional explanation I post my own.
Why your call don't work?
First of all your syntax is a shorthand. It's equivalent of
ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp))
so you can clearly see that you provide function
(see class(samp)
) as second argument of subset
. You could use samp(dfrm)
, but it won't work too cause samp
return data.frame
and subset
need logical vector. So you could use samp(dfrm)
when it returns logical indexing.
How to use subset in this case?
Make subset
work by feed him with logical vector:
ddply (larger_data, .(species), subset, sample(seq_along(species)<=40))
I create logical vector with 40 TRUE
(btw it works when for some spieces is less then 40 cases, then it return all) and random it.
精彩评论