Chi Square Analysis using for loop in R

2023-04-03 22:44 问答作者：

I'm trying to do chi square analysis for all combinations of variables in the data and my code is:

Data <- esoph[ , 1:3]
OldStatistic <- NA
for(i in 1:(ncol(Data)-1)){
for(j in (i+1):ncol(Data)){
Statistic <- data.frame("Row"=colnames(Data)[i], "Column"=colnames(Data)[j],
                     "Chi.Square"=round(chisq.test(Data[ ,i], Data[ ,j])$statistic, 3),
                     "df"=chisq.test(Data[ ,i], Data[ ,j])$parameter,
                     "p.value"=round(chisq.test(Data[ ,i], Data[ ,j])$p.value, 3),
                      row.names=NULL)
temp <- rbind(OldStatistic, Statistic)
OldStatistic <- Statistic
Statistic <- temp
}
}

str(Data)
'data.frame':   88 obs. of  3 variables:
 $ agegp: Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ alcgp: Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ tobgp: Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...


Statistic
    Row Column Chi.Square df p.value
1 agegp  tobgp      2.400 15       1
2 alcgp  tobgp      0.619  9       1

My code gives my the chi square analysis output for variable 1 vs variable 3, and variable 2 vs variable 3 and is missing for variable 1 vs variable 2. I tried hard but could not fixed the code. Any comment and suggestion will be highly appreciated. I'd like like to do cross tabulation for all possible combinations. Thanks in advance.

EDIT

I used to do this kind of analysis 开发者_如何学运维in SPSS but now I want to switch to R.

A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with combn. Then write a function to use with an apply function to iterate through the combos. I like to use plyr since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.

library(plyr)

combos <- combn(ncol(Dat),2)

adply(combos, 2, function(x) {
  test <- chisq.test(Dat[, x[1]], Dat[, x[2]])

  out <- data.frame("Row" = colnames(Dat)[x[1]]
                    , "Column" = colnames(Dat[x[2]])
                    , "Chi.Square" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
  return(out)

})

I wrote my own function. It creates a matrix where all nominal variables are tested against each other. It can also save the results as excel file. It displays all the pvalues that are smaller than 5%.

funMassChi <- function (x,delFirst=0,xlsxpath=FALSE) {
  options(scipen = 999)

  start <- (delFirst+1)
  ds <- x[,start:ncol(x)]

  cATeND <- ncol(ds)
  catID  <- 1:cATeND

  resMat <- ds[1:cATeND,1:(cATeND-1)]
  resMat[,] <- NA

    for(nCc in 1:(length(catID)-1)){
      for(nDc in (nCc+1):length(catID)){
        tryCatch({
          chiRes <- chisq.test(ds[,catID[nCc]],ds[,catID[nDc]])
          resMat[nDc,nCc]<- chiRes[[3]]
        }, error=function(e){cat(paste("ERROR :","at",nCc,nDc, sep=" "),conditionMessage(e), "\n")})
      }
    }
  resMat[resMat > 0.05] <- "" 
  Ergebnis <- cbind(CatNames=names(ds),resMat)
  Ergebnis <<- Ergebnis[-1,] 

  if (!(xlsxpath==FALSE)) {
     write.xlsx(x = Ergebnis, file = paste(xlsxpath,"ALLChi-",Sys.Date(),".xlsx",sep=""),
             sheetName = "Tabelle1", row.names = FALSE)
  }
}

funMassChi(categorialDATA,delFirst=3,xlsxpath="C:/folder1/folder2/")

delFirst can delete the first n columns. So if you have an count index or something you dont want to test.

I hope this can help anyone else.

继续阅读：chi-squared

Chi Square Analysis using for loop in R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？