开发者

Multiple correspondence analysis with R

HI, My question is both technical (using R) and statistical. I'm working on an image processing research project and I need to perform MCA. I previously posted a question on how to do this using Java Multivariate correspondence analysis (MCA) with JAVA, thanks to the answers I decided to do it using R. So here it is : I have a contingency table created from extracted features which has the form :

            var1_1 var1_2 var1_3 var2_1 var2_2 var2_3 ... var18_1 var18_2 var18_3

individual1

individual2

individual3

individual4

...

individualn


In each cell i have a double value representing a normalized frequency count between 0.0 and 1.0. My ultimate goal is to be able to plot each individual on the different combination of axes using MCA.

What I did :

  • used fdata <- read.table("filename.dat") to read the matrix file exported by Java
  • used mca_obj <- dudi.acm(fdata,scann=FALSE, nf=3) That gives an error saying all values should be a factor (Could someone clarify what does it mean 开发者_如何学Pythona factor)
  • used burt_data=acm.burt(fdata, fdata) to use the burt method since I have many variables
  • that gave me a very big table I couldn't understand (I experimented with removing the row names)

So to conclude : I know I'm sort of very close to finding the right way to perform MCA on my data I just need some hints on how to do it correctly. Can anyone please help!

Thanx


EDIT :

If I understand you correctly, your data is not suited for any mca function. You need the raw data, not normalized frequency counts of any kind. MCA works on categorical variables, not on numerical. What you need is data in the sense of :

              color   beak ...
individual1   red     big
individual2   red     small
individual3   blue    medium
individual4   green   small
...

if the normalized frequencies is really your data, you have numerical data and you can't perform a MCA on that.


A factor is a vector type in R, which can be seen as a categorical or enumerated type. If you have the data in the format described above, and you still have character variables instead of factor variables, you can convert your fdata with

fdata2 <- as.data.frame(lapply(fdata,as.factor))

This one you should be able to use in the dudi.acm() function.

Regarding the Burt table : off course that one is huge. It is the matrix multiplication X'X where X is your indicator matrix for your factors. So you get a table (actually a data frame) where the rownames and columnames are formed as nameOfFactor.nameOfLevel. So if you have 4 factors with each 5 levels, you'll have already a 20x20 matrix.

You can use this knowledge to dissect the Burt table and get the information on some factors of interest. Following the example in the help files, you could do something like :

require(ade4)
data(banque)
banque.acm <- dudi.acm(banque, scann = FALSE, nf = 3)
bb <- acm.burt(banque, banque)

idrow <- grepl("csp.",rownames(bb),fixed=T)
idcol <- grepl("duree.",names(bb),fixed=T)

> bb[idrow,idcol]
          duree.dm2 duree.d24 duree.d48 duree.d812 duree.dp12
csp.agric         3         6         6          3         11
csp.artis         7         3        15         13         10
csp.cadsu        13        19        32          9         30
csp.inter        12        14        19         25         32
csp.emplo        13        19        38         28         53
csp.ouvri        12        26        46         43         56
csp.retra         4         8         9          7         24
csp.inact        15        14        22         15         19
csp.etudi        12        23        20          1          1

which gives you the Burt table for the factors csp and duree in the dataframe.


It's difficult to provide really concrete feedback in your case, since it's hard to guess what your data looks like.

Here is what I suggest you do:

  1. Use function mca in packages MASS to do the correspondence analysis
  2. Study the example supplied in the help files: ?mca

You will find that that the requirements for mca is also a dataframe consisting of factors. (See the help file ?factor for more information.) But the example in mca makes it clear. It uses the dataset farms supplied as part of package MASS:

library(MASS)
head(farms)

  Mois Manag Use Manure
1   M1    SF  U2     C4
2   M1    BF  U2     C2
3   M2    SF  U2     C4
4   M2    SF  U2     C4
5   M1    HF  U1     C2
6   M1    HF  U2     C2

Notice that each point in the table is a factor entry. This means you will have to rework your input data to be in a similar format. You mention that your input data is a frequency table, which is not the required data format.

farms.mca <- mca(farms, abbrev=TRUE)
farms.mca
plot(farms.mca)

Multiple correspondence analysis with R

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜