Determining the goodness of an R fit using lm()
I learned to get a linear fit with some points using lm in my R script. So, I did that (which worked nice), and printed out the fit:
lm(formula = y2 ~ x2)
Residuals:
1 2 3 4
5.000e+00 -1.000e+01 5.000e+00 7.327e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.000 17.958开发者_StackOverflow 3.898 0.05996 .
x2 85.000 3.873 21.947 0.00207 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.66 on 2 degrees of freedom
Multiple R-squared: 0.9959, Adjusted R-squared: 0.9938
F-statistic: 481.7 on 1 and 2 DF, p-value: 0.00207
I'm trying to determine the best way to judge how great this fit is. I need to compare this fit with a few others (which are also linear using lm()
function). What value from this summary would be the best way to judge how good this fit is? I was thinking to use the residual standard error. Any suggestions. Also, how do I extract that value from the fit variable?
If you want to access the pieces produced by summary
directly, you can just call summary
and store the result in a variable and then inspect the resulting object:
rs <- summary(lm1)
names(rs)
Perhaps rs$sigma
is what you're looking for?
EDIT
Before someone chides me, I should point out that for some of this information, this is not the recommended way to access it. Rather you should use the designated extractors like residuals()
or coef
.
This code would do something similar:
y2 <- seq(1, 11, by=2)+rnorm(6) # six data points to your four points
x2=1:6
lm(y2 ~ x2)
summary(lm(y2 ~ x2))
The adjusted R^2 is the "goodness of fit" measure. It is saying that 99% of the variance in y2 can be "explained" by a straight line fit of y2 to x2. Whether you want to interpret your model with only 4 data points on the basis of that result is a matter of judgment. It would seem to somewhat dangerous to me.
To extract the residual sum of squares you use:
summary(lm(y2~x2))$sigma
See this for further details:
?summary.lm
There are some nice regression diagnostic plots you can look at with
plot(YourRegression, which=1:6)
where which=1:6 give you all six plots. The RESET test and bptest will test for misspecification and heteroskedasticity:
resettest(...)
bptest(...)
There are a lot of resources out there to think about this sort of thing. Fitting Distributions in R is one of them, and Faraway's "Practical Regression and Anova" is an R classic. I basically learned econometrics in R from Farnsworth's paper/book, although I don't recall if he has anything about goodness of fit.
If you are going to do a lot of econometrics in R, Applied Econometrics in R is a great pay-for book. And I've used the R for Economists webpage a lot.
Those are the first ones that pop to mind. I will mull a little more.
精彩评论