Calculating the Mode for Nominal as well as Continuous variables in [R]

2023-02-11 14:22 问答作者：

Can anyone help me with this?

If I run:

> mode(iris$Species)
[1] "numeric"
> mode(iris$Sepal.Width)
[1] "numeric"

Then I get "numeric" as answe开发者_C百科r

Cheers

The function mode() is used to find out the storage mode of the the object, in this case is is stored as mode "numeric". This function is not used to find the most "frequent" observed value in a data set, i.e. it is not used to find the statistical mode. See ?mode for more on what this function does in R and why it isn't useful for your problem.

For discrete data, the mode is the most frequent observed value among the set:

> set.seed(1) ## reproducible example
> dat <- sample(1:5, 100, replace = TRUE) ## dummy data
> (tab <- table(dat)) ## tabulate the frequencies
dat
 1  2  3  4  5 
13 25 19 26 17 
> which.max(tab) ## which is the mode?
4 
4 
> tab[which.max(tab)] ## what is the frequency of the mode?
 4 
26

For continuous data, the mode is the value of the data at which the probability density function (PDF) reaches a maximum. As your data are generally a sample from some continuous probability distribution, we don't know the PDF but we can estimate it through a histogram or better through a kernel density estimate.

Returning to the iris data, here is an example of determining the mode from continuous data:

> sepalwd <- with(iris, density(Sepal.Width)) ## kernel density estimate
> plot(sepalwd)
> str(sepalwd)
List of 7
 $ x        : num [1:512] 1.63 1.64 1.64 1.65 1.65 ...
 $ y        : num [1:512] 0.000244 0.000283 0.000329 0.000379 0.000436 ...
 $ bw       : num 0.123
 $ n        : int 150
 $ call     : language density.default(x = Sepal.Width)
 $ data.name: chr "Sepal.Width"
 $ has.na   : logi FALSE
 - attr(*, "class")= chr "density"
> with(sepalwd, which.max(y)) ## which value has maximal density?
[1] 224
> with(sepalwd, x[which.max(y)]) ## use the above to find the mode
[1] 3.000314

See ?density for more info. By default, density() evaluates the kernel density estimate at n = 512 equally spaced locations. If this is too crude for you, increase the number of locations evaluated and returned:

> sepalwd2 <- with(iris, density(Sepal.Width, n = 2048))
> with(sepalwd, x[which.max(y)])
[1] 3.000314

In this case it doesn't alter the result.

see ?mode : mode is giving you the storage mode. If you want the value with the maximum count, then use table.

> Sample <- sample(letters[1:5],50,replace=T)
> tmp <- table(Sample)
> tmp
Sample
 a  b  c  d  e 
 9 12  9  7 13 
> tmp[which(tmp==max(tmp))]
 e 
13

Please, read the help files if a function is not doing what you think it should.

Some extra explanation :

max(tmp) is the maximum of tmp

tmp == max(tmp) gives a logical vector with a length of tmp, indicating whether a value is equal or not to max(tmp).

which(tmp == max(tmp)) returns the index of the values in the vector that are TRUE. These indices you use to select the value in tmp that is the maximum value.

See the help files ?which, ?max and the introductory manuals for R.

See ?mode : mode is giving you the storage mode.

If you want to know the mode of a continuous random variable, I recently released the package ModEstM. In addition to the method proposed by Gavin Simpson, it addresses the case of multimodal variables. For example, in case you study the sample:

> x2 <- c(rbeta(1000, 23, 4), rbeta(1000, 4, 16))

Which is clearly bimodal, you get the answer:

> ModEstM::ModEstM(x2)
[[1]]
[1] 0.8634313 0.1752347

继续阅读：r

Calculating the Mode for Nominal as well as Continuous variables in [R]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？