开发者

Change the class of columns in a data frame

First of all, excuse me if I do any mistakes, but English is not a language I use very often.

I have a data frame with numbers. A small part of the data frame is this:

nominal 2 2 2 2

ordinal 2 1 1 2

So, I want to use the gower distance function on these numbers.

Here ( http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/StatMatch/man/gower.dist.html ) says that in order to use gower.dist, all nominal variables must be of class "factor" and all ordinal variables of class "ordered".

By default, all the columns are of class "integer" and mode "numeric". In order to change the class of the columns, i use these commands:

 DF=read.table("clipboard",header=TRUE,sep="\t")      
 # I select all the cells and I copy them to the clipboard. 
 #Then R, with this command, reads the data from there.

 MyHeader=names(DF)     # I save the headers of the data frame to a temp matrix

 for (i in 1:length(DF))  {
     if (MyHea开发者_Python百科der[[i]]=="nominal") DF[[i]]=as.factor(DF[[i]])
 }     

 for (i in 1:length(DF))  {
     if (MyHeader[[i]]=="ordinal") DF[[i]]=as.ordered(DF[[i]])
 }        

The first for/if loop changes the class from integer to factor, which is what I want, but the second changes the class of ordinal variables to: "ordered" "factor".

I need to change all the columns with the header "ordinal" to "ordered", as the gower.dist function says.

Thanks in advance, B.T.


What you are doing is fine --- if perhaps a little inelegantly.

With your ordered factor, you have something like:

> foo <- as.ordered(1:10)
> foo
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10
> class(foo)
[1] "ordered" "factor" 

Notice that it has two classes, indicating that it is an ordered factor and that is is a factor:

> is.ordered(as.ordered(1:10))
[1] TRUE
> is.factor(as.ordered(1:10))
[1] TRUE

In some senses, you might like to think that foo is an ordered factor but also inherits from the factor class too. Alternatively, if there isn't a specific method that handles ordered factors, but there is a method for factors, R will use the factor method. As far as R is concerned, an ordered factor is an object with classes "ordered" and "factor". This is what your function for Gower's distance will require.


You could easily do this with:

DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)

which gives you a dataframe with the correct structure. If you work with data frames, please stay away from [[]] unless you know very well what you're doing. Take Dirks advice, and check Owen's R Guide as well. You definitely need it.

If i do the conversion as I showed above, gower.dist() works perfectly fine. On a sidenote, the gowers distance can easily be calculated using the daisy() function as well:

DF <- data.frame(
    ordinal= c(1,2,3,1,2,1),
    nominal= c(2,2,2,2,2,2)
)
DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)

library(cluster)
daisy(DF,metric="gower")
library(StatMatch)
gower.dist(DF)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜