Easier way to plot the cumulative frequency distribution in ggplot?
I'm looking for an easier way to draw the cumulative distribution line in ggplot.
I have some data whose histogram I can immediately display with
qplot (mydata, binwidth=1);
I found a way to do it at http://www.r-tutor.com/elementary-statistics/quantitative-data/cumulative-frequency-graph but it involves s开发者_JAVA百科everal steps and when exploring data it's time consuming.
Is there a way to do it in a more straightforward way in ggplot, similar to how trend lines and confidence intervals can be added by specifying options?
The new version of ggplot2 (0.9.2.1) has a built-in stat_ecdf() function which let's you plot cumulative distributions very easily.
qplot(rnorm(1000), stat = "ecdf", geom = "step")
Or
df <- data.frame(x = c(rnorm(100, 0, 3), rnorm(100, 0, 10)),
g = gl(2, 100))
ggplot(df, aes(x, colour = g)) + stat_ecdf()
Code samples from ggplot2 documentation.
There is a built in ecdf()
function in R which should make things easier. Here's some sample code, utilizing plyr
library(plyr)
data(iris)
## Ecdf over all species
iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))
ggplot(iris.all, aes(Sepal.Length, ecdf)) + geom_step()
#Ecdf within species
iris.species <- ddply(iris, .(Species), summarize,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)))
ggplot(iris.species, aes(Sepal.Length, ecdf, color = Species)) + geom_step()
Edit I just realized that you want cumulative frequency. You can get that by multiplying the ecdf value by the total number of observations:
iris.all <- summarize(iris, Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length)) * length(Sepal.Length))
iris.species <- ddply(iris, .(Species), summarize,
Sepal.Length = unique(Sepal.Length),
ecdf = ecdf(Sepal.Length)(unique(Sepal.Length))*length(Sepal.Length))
Even easier:
qplot(unique(mydata), ecdf(mydata)(unique(mydata))*length(mydata), geom='step')
精彩评论