Using Plyr in R with a complex function that returns multiple variable
I have a data set with three grouping variables: condition, sub, & delay. Here is a simplified version of my data (real data is much longer)
sub condition delay later_value choiceRT later_choice primeRT cue 10 SIZE 10 27 1832 1 888 CHILD 10 PAST 5 11 298 0 1635 PANTS 10 SIZE 21 13 456 0 949 CANDY 11 SIZE 120 22 526 1 7963 BOY 11 FUTURE 120 27 561 1 4389 CHILDREN 11 PAST 5 13 561 1 2586 SPRING
I have a complicated set of procedures to apply to these data (details are not important) I wrote the following function that accomplishes what I want when split by the three grouping variables. It returns 3 variables that I am interested in (indiff, p_intercept, & p_lv)
getIndiffs <- function(currdelay){
if (mean(currdelay$later_choice) == 1) {
indiff = 10.5
p_intercept = "laters"
p_lv = "laters"
}
else if (mean(currdelay$later_choice) == 0) {
indiff = 30.5
# no p-val here, code that this was not calculated
p_intercept = "nows"
p_lv = "nows"
}
else {
F <- factor(currdelay$later_choice)
fit <- glm(F~later_value,data=currdelay,family=binomial())
indiff <- -coef(fit)[1]/coef(fit)[2]
if (indiff < 10) indiff = 10.5
else if (indiff > 30) indiff = 30.5
p_intercept = round(summary(fit)$coef[, "Pr(>|z|)"][1],3)
p_lv = round(summary(fit)$coef[, "Pr(>|z|)"][2], 3)
c(indiff,p_intercept,p_lv)
}
I am trying to use ddply to apply it to each subset of the data per the 3 grouping variables:
ddply(data,.(sub,condition,delay),getIndiffs)
However, when I run this I get the error
Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results do not have equal lengths
Strangely, this works fine when I use only 1 grouping variable but throws the error with 2+
Also, when I "simulate" splitting the dataset myself into a data drame only containing a subset split by the 3 grouping variables, my function works just fine. (Note: I've tried dif开发者_开发问答ferent ways of returning 3 variables or even returning just 1 variable and it does not work, either)
Basically, what I want to know is how to use plyr to use a function to return multiple variables.
Any other solutions to my problem that are fundamentally different are also welcome.
That error usually happens to me when my function applied to one of my pieces returns an empty data frame. In any case, an easy way to debug the situation is use dlply
instead of ddply, and examine the output; for instance
x <- dlply(data,.(sub,condition,delay),getIndiffs)
sapply(x,ncol)
to check that they all have the same number of columns. If not, standardize your function more.
It looks like your function getIndiffs
is designed to run on a single row, not on a whole dataframe. d*ply(x,vars,fn)
hands fn()
an entire data frame consisting of the subset of observations matching that group. Hm, also, the function can return in three different places -- at the end of each conditional clause. I think you meant to put c(indiff,p_intercept,p_lv)
after the last }
(and end your function with another }
).
精彩评论