chisq.test Error Message

2023-02-05 20:50 问答作者：

Here's a problem I'm encountering:

Example Data

df <- data.frame(1,2,3,4,5,6,7,8)
df <- rbind(df,df,df,df)

What I would like to do is find the p.value for the chisq.test of 1,2,3 vs. 4,5,6 in the data.frame defined above in the first row.

Let's try it flat out:

chisq.test(c(1,2,3),c(4,5,6))$p.value ## this works.

But when I try to do it by calling the columns/rows...

chisq.test(df[1,1:3],df[1,4:6])$p.value

Gives: Error in complete.cas开发者_如何学Pythones(x, y) : not all arguments have the same length

Interesting, because that doesn't seem to be true:

length(df[1,1:3])
length(df[1,4:6])

Any thoughts on how to change the notation to get the desired result?

?chisq.test tells us:

Arguments:

       x: a numeric vector or matrix. ‘x’ and ‘y’ can also both be
          factors.

       y: a numeric vector; ignored if ‘x’ is a matrix.  If ‘x’ is a
          factor, ‘y’ should be a factor of the same length.

If we look at df as per your Q, the subsets you define are:

> is.numeric(df[1,1:3])
[1] FALSE
> is.vector(df[1,1:3])
[1] FALSE
> is.matrix(df[1,1:3])
[1] FALSE

and the same for your other subset. What happens then is in the lap of the God's. What happens internally is that as df[1,1:3] is a data frame, it is converted first to a one column matrix, and thence to a vector:

Browse[2]> x ## here x is df[1,1:3]
[1] 1 2 3

whilst df[1,4:6] (y in the chisq.test function) is left untouched:

Browse[2]> y
  X4 X5 X6
1  4  5  6

and when the code calls complete.cases(x,y), we get the error you report:

Browse[2]> complete.cases(x, y)
Error in complete.cases(x, y) : not all arguments have the same length

complete.cases calls internal code so we can't see what is going on, but essentially R thinks x and y are not of the same length and this is because they are of different types.

@Prasad provides a work around, namely unlisting the 2 data frames you supply to chisq.test into vectors.

However, the way you are using the function doesn't make much sense, to me at least. One would normally store the data in columns, rather than rows of a data frame. It might not appear like there is a difference, but the columns of the data frame are its components, like the components of a list. Each individual component (column) is a discrete entity, a vector of data on the /n/ observations in the data frame. If we transpose your df (and cast back to a data frame) to reflect a more natural data set-up:

> df2 <- data.frame(t(df))

then we can use the approach you did, but index the separate rows of the first column of df2 (rather than the separate columns of the first row of df) in the chisq.test call:

> chisq.test(df2[1:3,1], df2[4:6,1])

    Pearson's Chi-squared test

data:  df2[1:3, 1] and df2[4:6, 1] 
X-squared = 6, df = 4, p-value = 0.1991

Warning message:
In chisq.test(df2[1:3, 1], df2[4:6, 1]) :
  Chi-squared approximation may be incorrect

This works, because R is able to drop the empty dimension in both subsets, so both inputs are vectors of the appropriate length:

> df2[1:3,1] ## drops the empty dimension!
[1] 1 2 3
> is.vector(df2[1:3,1])
[1] TRUE

Use unlist when you are extracting the rows from the data-frame:

> chisq.test(unlist(df[1,1:3]),unlist(df[1,4:6]))$p.value
[1] 0.1991483
Warning message:
In chisq.test(unlist(df[1, 1:3]), unlist(df[1, 4:6])) :
  Chi-squared approximation may be incorrect

继续阅读：dataframe r

chisq.test Error Message

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？