How to do a regression of a series of variables without typing each variable name
I want to run a regression with a bunch of independent variables from my dataset. There are a lot of predictors, so I do not want to write them all out. Is there a notation to span multiple columns so I don't have to type each?
My attempt was doing this (where my predictors are column 20 to 43):
modelAllHexSubscales = lm(HHdata$garisktot~HHdata[,20:43])
Obviously, this does not work because HHdata[,20:43]
is a matrix of data, whereas I really ne开发者_JAVA百科ed it to see the data as HHdata[,20]+HHdata[,21]
etc.
Here's another alternative:
# if garisktot is in columns 20:43
modelAllHexSubscales <- lm(garisktot ~ ., data=HHdata[,20:43])
# if it isn't
modelData <- data.frame(HHdata["garisktot"],HHdata[,20:43])
modelAllHexSubscales <- lm(garisktot ~ ., data=modelData)
Generate a formula by pasting column names first.
f <- as.formula(paste('garisktot ~', paste(colnames(HHdata)[20:43], collapse='+')))
modelAllHexSubscales <- lm(f, HHdata)
Have you tried to do it directly, as in
> y
[1] 10 19 30 42 51 59 72 78
> X
[,1] [,2]
[1,] 1 1.0
[2,] 2 3.0
[3,] 3 5.5
[4,] 4 7.0
[5,] 5 9.0
[6,] 6 11.0
[7,] 7 13.0
[8,] 8 16.0
> summary(lm(y ~ X))
Call:
lm(formula = y ~ X)
Residuals:
1 2 3 4 5 6 7 8
-0.1396 -1.2774 0.9094 1.4472 0.3094 -1.8283 1.0340 -0.4547
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.647 2.004 -1.321 0.24366
X1 15.436 3.177 4.859 0.00464 **
X2 -2.649 1.535 -1.726 0.14490
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.363 on 5 degrees of freedom
Multiple R-squared: 0.9978, Adjusted R-squared: 0.9969
F-statistic: 1124 on 2 and 5 DF, p-value: 2.32e-07
精彩评论