How to obtain VIF using biglm package?
I refer to this post http://r.789695.n4.nabble.com/Questions-about-biglm-td878929.html which discusses on how to obtain VIF using biglm.
Is there a开发者_如何学Gon alternative way of obtaining VIF from the object produced by biglm?
Thanks for your help
For simple models, this is relatively easy following the code in the vif()
method of "lm"
objects in the car package, as John Fox suggested in the R-Help thread you linked to. You can't use the car package directly as it uses the model matrix and that isn't going to be possible with biglm()
. To illustrate how to do this, consider the simple example from ?biglm
require(biglm)
data(trees)
ff <- log(Volume) ~ log(Girth) + log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
The fitted model is in a
, from which we extract the variance-covariance matrix of the parameters, drop the intercept, compute the correlation matrix R
and its determinant:
v <- vcov(a)
## drop intercept
v <- v[-1, -1, drop = FALSE]
R <- cov2cor(v)
detR <- det(R)
Next, have something to hold the VIFs in
res <- numeric(length = ncol(v))
names(res) <- colnames(v)
Finally, loop over the model terms (minus intercept) and compute the VIF for each term
for(i in seq_len(ncol(v))) {
res[i] <- det(R[i, i, drop = FALSE]) * det(R[-i, -i, drop = FALSE]) / detR
}
This results in:
> res
log(Girth) log(Height)
1.391027 1.391027
If we load the car package and use it to compute VIFs for the same model fitted using lm()
, we can see that it gives the same output
> require(car)
> mod <- lm(ff, data = trees)
> vif(mod)
log(Girth) log(Height)
1.391027 1.391027
vif()
looks a bit cleverer than the code I show as it works out if model terms are included in more coefficients than just the one main effect that my code assumes. In such circumstances, a model covariate will be included in more than one column/row of variance-covariance matrix v
and you need to retain/exclude all row/columns containing the term when computing the determinants in the for()
loop. You can work this out from the variance-covariance matrix but you can figure that out yourself.
When testing this, fit your model to a small random sample of the data using both biglm()
and lm()
, and compute the VIFs using car's vif()
on the resulting "lm"
object and by hand on the "biglm"
object and check they concur.
精彩评论