Classification functions in linear discriminant analysis in R

2023-02-23 14:55 问答作者：

After completing a linear discriminant analysis in R using lda(), is there a convenient way to extract the classification functions for each group?

From the link,

These are not to be confused with the discriminant functions. The classification functions can be used to determine to which group each case most likely belongs. There are as many classification functions as there are groups. Each function allows us to compute classification scores for each case for each group, by applying the formula:

Si = ci + wi1*x1 + wi2*x2 + ... + wim*xm

In this formula, the subscript i denotes the respective group; the subscripts 1, 2, ..., m denote the m variables; ci is a constant for the i'th group, wij is the weight for the j'th variable in the computation of the classification score for the i'th group; xj is the observed value for the respective case for the j'th variable. Si is开发者_运维问答 the resultant classification score.

We can use the classification functions to directly compute classification scores for some new observations.

I can build them from scratch using textbook formulas, but that requires rebuilding a number of intermediate steps from the lda analysis. Is there a way to get them after the fact from the lda object?

Added:

Unless I'm still misunderstanding something in Brandon's answer (sorry for the confusion!), it appears the answer is no. Presumably the majority of users can get the information they need from predict(), which provides classifications based on lda().

Suppose x is your LDA object:

x$terms

You can have a peak at the object by looking at it's structure:

str(x)

Update:

Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
table(Iris$Sp[train])
z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
predict(z, Iris[-train, ])$class
str(z)
List of 10
 $ prior  : Named num [1:3] 0.333 0.333 0.333
  ..- attr(*, "names")= chr [1:3] "c" "s" "v"
 $ counts : Named int [1:3] 30 25 20
  ..- attr(*, "names")= chr [1:3] "c" "s" "v"
 $ means  : num [1:3, 1:4] 6.03 5.02 6.72 2.81 3.43 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "c" "s" "v"
  .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
 $ scaling: num [1:4, 1:2] 0.545 1.655 -1.609 -3.682 -0.443 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..$ : chr [1:2] "LD1" "LD2"
 $ lev    : chr [1:3] "c" "s" "v"
 $ svd    : num [1:2] 33.66 2.93
 $ N      : int 75
 $ call   : language lda(formula = Sp ~ ., data = Iris, prior = c(1, 1, 1)/3, subset = train)
 $ terms  :Classes 'terms', 'formula' length 3 Sp ~ Sepal.L. + Sepal.W. + Petal.L. + Petal.W.
  .. ..- attr(*, "variables")= language list(Sp, Sepal.L., Sepal.W., Petal.L., Petal.W.)
  .. ..- attr(*, "factors")= int [1:5, 1:4] 0 1 0 0 0 0 0 1 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:5] "Sp" "Sepal.L." "Sepal.W." "Petal.L." ...
  .. .. .. ..$ : chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..- attr(*, "term.labels")= chr [1:4] "Sepal.L." "Sepal.W." "Petal.L." "Petal.W."
  .. ..- attr(*, "order")= int [1:4] 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv
  .. ..- attr(*, "predvars")= language list(Sp, Sepal.L., Sepal.W., Petal.L., Petal.W.)
  .. ..- attr(*, "dataClasses")= Named chr [1:5] "factor" "numeric" "numeric" "numeric" ...
  .. .. ..- attr(*, "names")= chr [1:5] "Sp" "Sepal.L." "Sepal.W." "Petal.L." ...
 $ xlevels: Named list()
 - attr(*, "class")= chr "lda"

There isn't a built-in way to get the information I needed, so I wrote a function to do it:

ty.lda <- function(x, groups){
  x.lda <- lda(groups ~ ., as.data.frame(x))

  gr <- length(unique(groups))   ## groups might be factors or numeric
  v <- ncol(x) ## variables
  m <- x.lda$means ## group means

  w <- array(NA, dim = c(v, v, gr))

  for(i in 1:gr){
    tmp <- scale(subset(x, groups == unique(groups)[i]), scale = FALSE)
    w[,,i] <- t(tmp) %*% tmp
  }

  W <- w[,,1]
  for(i in 2:gr)
    W <- W + w[,,i]

  V <- W/(nrow(x) - gr)
  iV <- solve(V)

  class.funs <- matrix(NA, nrow = v + 1, ncol = gr)
  colnames(class.funs) <- paste("group", 1:gr, sep=".")
  rownames(class.funs) <- c("constant", paste("var", 1:v, sep = "."))

  for(i in 1:gr) {
    class.funs[1, i] <- -0.5 * t(m[i,]) %*% iV %*% (m[i,])
    class.funs[2:(v+1) ,i] <- iV %*% (m[i,])
  }

  x.lda$class.funs <- class.funs

  return(x.lda)
}

This code follows the formulas in Legendre and Legendre's Numerical Ecology (1998), page 625, and matches the results of the worked example starting on page 626.

I think your question was flawed ... OK, maybe not flawed but somewhat misleading at the very least. The discriminant function(s) refers to distances between groups, so there is no function associated with a single group but rather a function that describes the distances between any two group centroids. I just answered a more recent question and placed an example of calculating a score function using the iris dataset and using it to label cases in a 2d plot of predictors. In the case of a 2 group analysis the function will be greater than zero for one group and less than zero for the other group.

继续阅读：classification r

Classification functions in linear discriminant analysis in R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？