Calculate 95th percentile of values with grouping variable

2023-02-20 17:26 问答作者：

I'm trying to calcu开发者_如何学编程late the 95th percentile for multiple water quality values grouped by watershed, for example:

Watershed   WQ
50500101    62.370661
50500101    65.505046
50500101    58.741477
50500105    71.220034
50500105    57.917249

I reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. It seems very close to what I want to do but it's for EACH observation. I need it for each grouping variable. so ideally,

Watershed   WQ - 95th
50500101    x
50500105    y

This can be achieved using the plyr library. We specify the grouping variable Watershed and ask for the 95% quantile of WQ.

library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))

and the results

  Watershed     WQ95
    1         a 1.353993
    2         b 1.461711

I hope I understand your question correctly. Is this what you're looking for?

my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))

  Group.1         x
1       1 0.6913747
2       2 0.8067847
3       3 0.9643744

EDIT

Based on Vincent's answer,

aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs  = 0.95)

also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...) for deciles. Or you can try function summary for some predefined statistics.

aggregate(my.df$var, by = list(my.df$group), FUN = summary)

Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:

DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))

Use this:

with(DF, tapply(wq, watershed, quantile, probs=0.95))

In Excel, you're going to want to use an array formula to make this easy. I suggest the following:

{=PERCENTILE(IF($A2:$A6 = Watershed ID, $B$2:$B$6), 0.95)}

Column A would be the Watershed ids, and Column B would be the WQ values.

Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.

Using the data.table-package you can do:

set.seed(42)
#Sample data
dt <- data.table(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))

dt[ ,
    j = .(WQ95 = quantile(WQ, .95, na.rm = TRUE),
    by = Watershed]

继续阅读：excel grouping r variables

Calculate 95th percentile of values with grouping variable

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？