Chi Square Analysis using for loop in R
I'm trying to do chi square analysis for all combinations of variables in the data and my code is:
Data <- esoph[ , 1:3]
OldStatistic <- NA
for(i in 1:(ncol(Data)-1)){
for(j in (i+1):ncol(Data)){
Statistic <- data.frame("Row"=colnames(Data)[i], "Column"=colnames(Data)[j],
"Chi.Square"=round(chisq.test(Data[ ,i], Data[ ,j])$statistic, 3),
"df"=chisq.test(Data[ ,i], Data[ ,j])$parameter,
"p.value"=round(chisq.test(Data[ ,i], Data[ ,j])$p.value, 3),
row.names=NULL)
temp <- rbind(OldStatistic, Statistic)
OldStatistic <- Statistic
Statistic <- temp
}
}
str(Data)
'data.frame': 88 obs. of 3 variables:
$ agegp: Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ alcgp: Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
$ tobgp: Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...
Statistic
Row Column Chi.Square df p.value
1 agegp tobgp 2.400 15 1
2 alcgp tobgp 0.619 9 1
My code gives my the chi square analysis output for variable 1 vs variable 3, and variable 2 vs variable 3 and is missing for variable 1 vs variable 2. I tried hard but could not fixed the code. Any comment and suggestion will be highly appreciated. I'd like like to do cross tabulation for all possible combinations. Thanks in advance.
EDIT
I used to do this kind of analysis 开发者_如何学运维in SPSS but now I want to switch to R.
A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with combn
. Then write a function to use with an apply
function to iterate through the combos. I like to use plyr
since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.
library(plyr)
combos <- combn(ncol(Dat),2)
adply(combos, 2, function(x) {
test <- chisq.test(Dat[, x[1]], Dat[, x[2]])
out <- data.frame("Row" = colnames(Dat)[x[1]]
, "Column" = colnames(Dat[x[2]])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
I wrote my own function. It creates a matrix where all nominal variables are tested against each other. It can also save the results as excel file. It displays all the pvalues that are smaller than 5%.
funMassChi <- function (x,delFirst=0,xlsxpath=FALSE) {
options(scipen = 999)
start <- (delFirst+1)
ds <- x[,start:ncol(x)]
cATeND <- ncol(ds)
catID <- 1:cATeND
resMat <- ds[1:cATeND,1:(cATeND-1)]
resMat[,] <- NA
for(nCc in 1:(length(catID)-1)){
for(nDc in (nCc+1):length(catID)){
tryCatch({
chiRes <- chisq.test(ds[,catID[nCc]],ds[,catID[nDc]])
resMat[nDc,nCc]<- chiRes[[3]]
}, error=function(e){cat(paste("ERROR :","at",nCc,nDc, sep=" "),conditionMessage(e), "\n")})
}
}
resMat[resMat > 0.05] <- ""
Ergebnis <- cbind(CatNames=names(ds),resMat)
Ergebnis <<- Ergebnis[-1,]
if (!(xlsxpath==FALSE)) {
write.xlsx(x = Ergebnis, file = paste(xlsxpath,"ALLChi-",Sys.Date(),".xlsx",sep=""),
sheetName = "Tabelle1", row.names = FALSE)
}
}
funMassChi(categorialDATA,delFirst=3,xlsxpath="C:/folder1/folder2/")
delFirst can delete the first n columns. So if you have an count index or something you dont want to test.
I hope this can help anyone else.
精彩评论