implementation of the Gower distance function

2023-01-27 03:13 问答作者：

I have a matrix (size: 28 columns and 47 rows) with numbers. This matrix has an extra row that is contains headers for the columns ("ordinal" and "nominal").

I want to use the Gower distance function on this matrix. Here says that:

The final dissimilarity between the ith and jth units is obtained as a weighted sum of dissimilarities for each variable:

    d(i,j) = sum_k(delta_ijk * d_ijk ) / sum_k( delta_ijk )

In particular, d_ijk represents the distance between the ith and jth unit computed considering the kth variable. It depends on the nature of the variable:

factor or character columns are considered as categorical nominal variables and d_ijk = 0 if

x_ik =x_jk, 1 otherwise;
ordered columns are considered as categorical ordinal variables and
the values are substituted with the
corresponding position index, r_ik in the factor levels. The开发者_JS百科se position
indexes (that are different from the output of the R function rank) are
transformed in the following manner

z_ik = (r_ik - 1)/(max(r_ik) - 1)

These new values, z_ik, are treated as observations of an

interval scaled variable.

As far as the weight delta_ijk is concerned:

delta_ijk = 0 if x_ik = NA or x_jk = NA;
delta_ijk = 1 in all the other cases.

I know that there is a gower.dist function, but I must do it that way. So, for "d_ijk", "delta_ijk" and "z_ik", I tried to make functions, as I didn't find a better way.

I started with "delta_ijk" and i tried this:

Delta=function(i,j){for (i in 1:28){for (j in 1:47){  
+{if (MyHeader[i,j]=="nominal")
+ result=0
+{else if (MyHeader[i,j]=="ordinal") result=1}}}}
+;result}

But I got error. So I got stuck and I can't do the rest.

P.S. Excuse me if I make mistakes, but English is not a language I very often.

Why do you want to reinvent the wheel billyt? There are several functions/packages in R that will compute this for you, including daisy() in package cluster which comes with R.

First things first though, get those "data type" headers out of your data. If this truly is a matrix then character information in this header row will make the whole matrix a character matrix. If it is a data frame, then all columns will likely be factors. What you want to do is code the type of data in each column (component of your data frame) as 'factor' or 'ordered'.

df <- data.frame(A = c("ordinal",1:3), B = c("nominal","A","B","A"),
                 C = c("nominal",1,2,1))

Which gives this --- note that all are stored as factors because of the extra info.

> head(df)
        A       B       C
1 ordinal nominal nominal
2       1       A       1
3       2       B       2
4       3       A       1
> str(df)
'data.frame':   4 obs. of  3 variables:
 $ A: Factor w/ 4 levels "1","2","3","ordinal": 4 1 2 3
 $ B: Factor w/ 3 levels "A","B","nominal": 3 1 2 1
 $ C: Factor w/ 3 levels "1","2","nominal": 3 1 2 1

If we get rid of the first row and recode into the correct types, we can compute Gower's coefficient easily.

> headers <- df[1,]
> df <- df[-1,]
> DF <- transform(df, A = ordered(A), B = factor(B), C = factor(C))
> ## We've previously shown you how to do this (above line) for lots of columns!
> str(DF)
'data.frame':   3 obs. of  3 variables:
 $ A: Ord.factor w/ 3 levels "1"<"2"<"3": 1 2 3
 $ B: Factor w/ 2 levels "A","B": 1 2 1
 $ C: Factor w/ 2 levels "1","2": 1 2 1
> require(cluster)
> daisy(DF)
Dissimilarities :
          2         3
3 0.8333333          
4 0.3333333 0.8333333

Metric :  mixed ;  Types = O, N, N 
Number of objects : 3

Which gives the same as gower.dist() for this data (although in a slightly different format (as.matrix(daisy(DF))) would be equivalent):

> gower.dist(DF)
          [,1]      [,2]      [,3]
[1,] 0.0000000 0.8333333 0.3333333
[2,] 0.8333333 0.0000000 0.8333333
[3,] 0.3333333 0.8333333 0.0000000

You say you can't do it this way? Can you explain why not? As you seem to be going to some degree of effort to do something that other people have coded up for you already. This isn't homework, is it?

I'm not sure what your logic is doing, but you are putting too many "{" in there for your own good. I generally use the {} pairs to surround the consequent-clause:

Delta=function(i,j){for (i in 1:28) {for (j in 1:47){  
       if (MyHeader[i,j]=="nominal") {
         result=0
    # the "{" in the next line before else was sabotaging your efforts
        } else if (MyHeader[i,j]=="ordinal") { result=1} }
      result}
                  }

Thanks Gavin and DWin for your help. I managed to solve the problem and find the right distance matrix. I used daisy() after I recoded the class of the data and it worked.

P.S. The solution that you suggested at my other topic for changing the class of the columns:

DF$nominal <- as.factor(DF$nominal)
DF$ordinal <- as.ordered(DF$ordinal)

didn't work. It changed only the first nominal and ordinal column.

Thanks again for your help.

继续阅读：function r

implementation of the Gower distance function

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？