Multiple correspondence analysis with R
HI, My question is both technical (using R) and statistical. I'm working on an image processing research project and I need to perform MCA. I previously posted a question on how to do this using Java Multivariate correspondence analysis (MCA) with JAVA, thanks to the answers I decided to do it using R. So here it is : I have a contingency table created from extracted features which has the form :
var1_1 var1_2 var1_3 var2_1 var2_2 var2_3 ... var18_1 var18_2 var18_3
individual1
individual2 individual3 individual4 ... individualnIn each cell i have a double value representing a normalized frequency count between 0.0 and 1.0. My ultimate goal is to be able to plot each individual on the different combination of axes using MCA.
What I did :
- used fdata <- read.table("filename.dat") to read the matrix file exported by Java
- used mca_obj <- dudi.acm(fdata,scann=FALSE, nf=3) That gives an error saying all values should be a factor (Could someone clarify what does it mean 开发者_如何学Pythona factor)
- used burt_data=acm.burt(fdata, fdata) to use the burt method since I have many variables
- that gave me a very big table I couldn't understand (I experimented with removing the row names)
So to conclude : I know I'm sort of very close to finding the right way to perform MCA on my data I just need some hints on how to do it correctly. Can anyone please help!
Thanx
EDIT :
If I understand you correctly, your data is not suited for any mca function. You need the raw data, not normalized frequency counts of any kind. MCA works on categorical variables, not on numerical. What you need is data in the sense of :
color beak ...
individual1 red big
individual2 red small
individual3 blue medium
individual4 green small
...
if the normalized frequencies is really your data, you have numerical data and you can't perform a MCA on that.
A factor
is a vector type in R, which can be seen as a categorical or enumerated type. If you have the data in the format described above, and you still have character variables instead of factor variables, you can convert your fdata with
fdata2 <- as.data.frame(lapply(fdata,as.factor))
This one you should be able to use in the dudi.acm()
function.
Regarding the Burt table : off course that one is huge. It is the matrix multiplication X'X where X is your indicator matrix for your factors. So you get a table (actually a data frame) where the rownames and columnames are formed as nameOfFactor.nameOfLevel
. So if you have 4 factors with each 5 levels, you'll have already a 20x20 matrix.
You can use this knowledge to dissect the Burt table and get the information on some factors of interest. Following the example in the help files, you could do something like :
require(ade4)
data(banque)
banque.acm <- dudi.acm(banque, scann = FALSE, nf = 3)
bb <- acm.burt(banque, banque)
idrow <- grepl("csp.",rownames(bb),fixed=T)
idcol <- grepl("duree.",names(bb),fixed=T)
> bb[idrow,idcol]
duree.dm2 duree.d24 duree.d48 duree.d812 duree.dp12
csp.agric 3 6 6 3 11
csp.artis 7 3 15 13 10
csp.cadsu 13 19 32 9 30
csp.inter 12 14 19 25 32
csp.emplo 13 19 38 28 53
csp.ouvri 12 26 46 43 56
csp.retra 4 8 9 7 24
csp.inact 15 14 22 15 19
csp.etudi 12 23 20 1 1
which gives you the Burt table for the factors csp and duree in the dataframe.
It's difficult to provide really concrete feedback in your case, since it's hard to guess what your data looks like.
Here is what I suggest you do:
- Use function
mca
in packagesMASS
to do the correspondence analysis - Study the example supplied in the help files:
?mca
You will find that that the requirements for mca
is also a dataframe consisting of factors. (See the help file ?factor
for more information.) But the example in mca
makes it clear. It uses the dataset farms
supplied as part of package MASS
:
library(MASS)
head(farms)
Mois Manag Use Manure
1 M1 SF U2 C4
2 M1 BF U2 C2
3 M2 SF U2 C4
4 M2 SF U2 C4
5 M1 HF U1 C2
6 M1 HF U2 C2
Notice that each point in the table is a factor entry. This means you will have to rework your input data to be in a similar format. You mention that your input data is a frequency table, which is not the required data format.
farms.mca <- mca(farms, abbrev=TRUE)
farms.mca
plot(farms.mca)
精彩评论