Multiple correspondence analysis with R

2023-03-02 17:19 问答作者：

HI, My question is both technical (using R) and statistical. I'm working on an image processing research project and I need to perform MCA. I previously posted a question on how to do this using Java Multivariate correspondence analysis (MCA) with JAVA, thanks to the answers I decided to do it using R. So here it is : I have a contingency table created from extracted features which has the form :

            var1_1 var1_2 var1_3 var2_1 var2_2 var2_3 ... var18_1 var18_2 var18_3

individual1

individual2

individual3

individual4

...

individualn

In each cell i have a double value representing a normalized frequency count between 0.0 and 1.0. My ultimate goal is to be able to plot each individual on the different combination of axes using MCA.

What I did :

used fdata <- read.table("filename.dat") to read the matrix file exported by Java

used mca_obj <- dudi.acm(fdata,scann=FALSE, nf=3) That gives an error saying all values should be a factor (Could someone clarify what does it mean 开发者_如何学Pythona factor)

used burt_data=acm.burt(fdata, fdata) to use the burt method since I have many variables

that gave me a very big table I couldn't understand (I experimented with removing the row names)

So to conclude : I know I'm sort of very close to finding the right way to perform MCA on my data I just need some hints on how to do it correctly. Can anyone please help!

Thanx

EDIT :

If I understand you correctly, your data is not suited for any mca function. You need the raw data, not normalized frequency counts of any kind. MCA works on categorical variables, not on numerical. What you need is data in the sense of :

              color   beak ...
individual1   red     big
individual2   red     small
individual3   blue    medium
individual4   green   small
...

if the normalized frequencies is really your data, you have numerical data and you can't perform a MCA on that.

A factor is a vector type in R, which can be seen as a categorical or enumerated type. If you have the data in the format described above, and you still have character variables instead of factor variables, you can convert your fdata with

fdata2 <- as.data.frame(lapply(fdata,as.factor))

This one you should be able to use in the dudi.acm() function.

Regarding the Burt table : off course that one is huge. It is the matrix multiplication X'X where X is your indicator matrix for your factors. So you get a table (actually a data frame) where the rownames and columnames are formed as nameOfFactor.nameOfLevel. So if you have 4 factors with each 5 levels, you'll have already a 20x20 matrix.

You can use this knowledge to dissect the Burt table and get the information on some factors of interest. Following the example in the help files, you could do something like :

require(ade4)
data(banque)
banque.acm <- dudi.acm(banque, scann = FALSE, nf = 3)
bb <- acm.burt(banque, banque)

idrow <- grepl("csp.",rownames(bb),fixed=T)
idcol <- grepl("duree.",names(bb),fixed=T)

> bb[idrow,idcol]
          duree.dm2 duree.d24 duree.d48 duree.d812 duree.dp12
csp.agric         3         6         6          3         11
csp.artis         7         3        15         13         10
csp.cadsu        13        19        32          9         30
csp.inter        12        14        19         25         32
csp.emplo        13        19        38         28         53
csp.ouvri        12        26        46         43         56
csp.retra         4         8         9          7         24
csp.inact        15        14        22         15         19
csp.etudi        12        23        20          1          1

which gives you the Burt table for the factors csp and duree in the dataframe.

It's difficult to provide really concrete feedback in your case, since it's hard to guess what your data looks like.

Here is what I suggest you do:

Use function mca in packages MASS to do the correspondence analysis
Study the example supplied in the help files: ?mca

You will find that that the requirements for mca is also a dataframe consisting of factors. (See the help file ?factor for more information.) But the example in mca makes it clear. It uses the dataset farms supplied as part of package MASS:

library(MASS)
head(farms)

  Mois Manag Use Manure
1   M1    SF  U2     C4
2   M1    BF  U2     C2
3   M2    SF  U2     C4
4   M2    SF  U2     C4
5   M1    HF  U1     C2
6   M1    HF  U2     C2

Notice that each point in the table is a factor entry. This means you will have to rework your input data to be in a similar format. You mention that your input data is a frequency table, which is not the required data format.

farms.mca <- mca(farms, abbrev=TRUE)
farms.mca
plot(farms.mca)

Multiple correspondence analysis with R

继续阅读：statistics

Multiple correspondence analysis with R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？