Applying a function repeatedly to many subjects

2023-03-03 02:53 问答作者：

I have a data frame as follows,

> mydata
date  station  treatment  subject   par
A       a         0         R1      1.3    
A       a         0         R1      1.4    
A       a         1         R2      1.4   
A       a         1         R2      1.1    
A       b         0         R1      1.5    
A       b         0         R1      1.8     
A       b         1         R2      2.5     
A       b         1         R2      9.5    
B       a         0         R1      0.3    
B       a         0         R1      8.2    
B       a         1         R2      7.3    
B       a         1         R2      0.2    
B       b         0         R1      9.4    
B       b         0         R1      3.2    
B       b         1         R2      3.5    
B       b         1         R2      2.4 
....

where:

date is a factor with 2 levels A/B; station is a factor with 2 levels a/b; treatment is a factor with 2 levels 0/1;

subject are the replicates R1 to R20 assigned to treatment (10 to treatment 0 and 10 to treatment 1);

and par is my parameter, which is a repeated measurement of particle size for each subject at at each date and station

What i need to do is: divide par in 10 equal bins and count the number in each bin. This has to be done in subsets of mydata definded by a combination of date station and subject. The final outcome has to be a daframe myres as follow:

> myres
    date  station  treatment  bin.centre  freq
    A       a         0         1.2        4 
    A       a         0         1.3        3    
    A       a         0         1.4        2 
    A       a         0         1.5        1    
    A       a         1         1.2        4    
    A       a         1         1.3        3    
    A       a         1         1.4        2     
    A       a         1         1.5        1    
    B       b         0         2.3        5   
    B       b         0         2.4        4    
    B       b         0         2.5        3    
    B       b         0         2.6        2   
    B       b         1         2.3        5   
    B       b         1         2.4        4   
    B       b         1         2.5        3   
    B       b         1         2.6        2
    ....

this is what i've done so far:

#define the number of bins
num.bins<-10

#define the width of each bins
bin.width<-(max(par)-min(par))/num.bins

#define the lower and upper boundaries of each bins
bins<-seq(from=min(par), to=max(par), by=bin.width)

#define the centre of each bins
bin.centre<-c(seq(min(bins)+bin.width/2,max(bins)-bin.width/2,by=bin.width))

#create a vector to store the frequency in each bins

  freq<-numeric(length(length(bins-1)))

 # this is the loop that counts the frequency of particles between the lower and upper boundaries
 of each bins and store the result in freq

 for(i in 1:10){
    freq[i]<-length(which(par>=bins[i] &
    par<bins[i+1]))
     }

 #create the data frame with the results
 res<-data.frame(bin.centre,res)

my first approach was to subset mydata manually, using subset(),for each combination of subject station and date, and apply the above sequence of commands for each subsets, then build the final dataframe combining each single res using rbind(), but this procedure was very convoluted and subject to the propagation of errors. What i would like to do, is to automate the above procedure so that it calculates the binned frequency distribution for each subject. My intuition is that the best开发者_开发问答 way to do this is by creating a function for estimating this particle distribution, and then applying it to each subject via a for loop. However, I am not sure of how to do this. Any suggestions would be really appreciated.

thanks matteo.

You can do this in a few steps using the functionality in the plyr package. This allows you to split your data into the desired chunks, apply a statistic to each chunk, and combine the results.

First I set up some dummy data:

set.seed(1)
n <- 100
dat <- data.frame(
    date=sample(LETTERS[1:2], n, replace=TRUE),
    station=sample(letters[1:2], n, replace=TRUE),
    treatment=sample(0:1, n, replace=TRUE),
    subject=paste("R", sample(1:2, n, replace=TRUE), sep=""),
    par=runif(n, 0, 5)
)
head(dat)

  date station treatment subject       par
1    A       b         0      R2 3.2943880
2    A       a         0      R1 0.9253498
3    B       a         1      R1 4.7718907
4    B       b         0      R1 4.4892425
5    A       b         0      R1 4.7184853
6    B       a         1      R2 3.6184538

Now I use the function in base called cut to divide your par into equal sized bins:

dat$bin <- cut(dat$par, breaks=10)

Now for the fun bit. Load package plyr and use the function ddply to split, apply and combine. Because you want a frequency count, we can use the function length to count the number of times each replicate appeared in that bin:

library(plyr)
res <- ddply(dat, .(date, station, treatment, bin), 
  summarise, freq=length(treatment))
head(res)

  date station treatment             bin freq
1    A       a         0 (0.00422,0.501]    1
2    A       a         0   (0.501,0.998]    2
3    A       a         0      (1.5,1.99]    4
4    A       a         0     (1.99,2.49]    2
5    A       a         0     (2.49,2.99]    2
6    A       a         0     (2.99,3.48]    1

Applying a function repeatedly to many subjects

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？