How can I superimpose modified loess lines on a ggplot2 qplot?
Background
Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-)
My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that:
- The residuals don't systematically covary with the predictor
- The residuals are homoscedastic with respect to each predictor in the model
On point (2), my textbook has this to say:
Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residua开发者_如何学编程ls....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131)
How can I modify loess lines?
I know how to generate a scatterplot with a "0-line,":
# First, I'll make a simple linear model and get its diagnostic stats
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)
# Now I want to make sure the residuals are homoscedastic
qplot (x = dist, y = .resid, data = mod) +
geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands
But does anyone know how I can use ggplot2
and qplot
to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?
Apology
Folks, I want to apologize for my ignorance. Hadley is absolutely right, and the answer was right in front of me all along. As I suspected, my question was born of statistical, rather than programmatic ignorance.
We get the 68% Confidence Interval for Free
geom_smooth()
defaults to loess
smoothing, and it superimposes the +1sd and -1sd lines as part of the deal. That's what Hadley meant when he said "Isn't that just a 68% confidence interval?" I just completely forgot that's what the 68% interval is, and kept searching for something that I already knew how to do. It didn't help that I'd actually turned the confidence intervals off in my code by specifying geom_smooth(se = FALSE)
.
What my Sample Code Should Have Looked Like
# First, I'll make a simple linear model and get its diagnostic stats.
library(ggplot2)
data(cars)
mod <- fortify(lm(speed ~ dist, data = cars))
attach(mod)
str(mod)
# Now I want to make sure the residuals are homoscedastic.
# By default, geom_smooth is loess and includes the 68% standard error bands.
qplot (x = dist, y = .resid, data = mod) +
geom_abline(slope = 0, intercept = 0) +
geom_smooth()
What I've Learned
Hadley implemented a really beautiful and simple way to get what I'd wanted all along. But because I was focused on loess lines, I lost sight of the fact that the 68% confidence interval was bounded by the very lines I needed. Sorry for the trouble, everyone.
Could you calculate the +/- standard deviation values from the data and add a fitted curve of them to the plot?
Have a look at my question "modify lm or loess function.."
I am not sure I followed your question very well, but maybe a:
+ stat_smooth(method=yourfunction)
will work, provided that you define your function as described here.
精彩评论