Generating interaction variables in R dataframes
Is there a way - other than a for loop - to generate new variables in an R dataframe, which will be all the possible 2-way interactions between the existing ones? i.e. supposing a dataframe with three numeric variables V1, V2, V3, I would like to generate the following new variables:
Inter.V1V2 (= V1 * V2)
Inter.V1V3 (= V1 * V3)
Inter.V2V3 (= V2 * V3)
Example using for loop :
x <- read.table(textConnection('
V1 V2 V3 V4
1 9 25 18
2 5 20 10
3 4 30 12
4 4 34 16'
), header=TRUE)
dim.init <- dim(x)[2]
for (i in 1开发者_运维百科: (dim.init - 1) ) {
for (j in (i + 1) : (dim.init) ) {
x[dim(x)[2] + 1] <- x[i] * x[j]
names(x)[dim(x)[2]] <- paste("Inter.V",i,"V",j,sep="")
}
}
Here is a one liner for you that also works if you have factors:
> model.matrix(~(V1+V2+V3+V4)^2,x)
(Intercept) V1 V2 V3 V4 V1:V2 V1:V3 V1:V4 V2:V3 V2:V4 V3:V4
1 1 1 9 25 18 9 25 18 225 162 450
2 1 2 5 20 10 10 40 20 100 50 200
3 1 3 4 30 12 12 90 36 120 48 360
4 1 4 4 34 16 16 136 64 136 64 544
attr(,"assign")
[1] 0 1 2 3 4 5 6 7 8 9 10
Here you go, using combn
and apply
:
> x2 <- t(apply(x, 1, combn, 2, prod))
Setting the column names can be done with two paste
commands:
> colnames(x2) <- paste("Inter.V", combn(1:4, 2, paste, collapse="V"), sep="")
Lastly, if you want all your variables together, just cbind
them:
> x <- cbind(x, x2)
> V1 V2 V3 V4 Inter.V1V2 Inter.V1V3 Inter.V1V4 Inter.V2V3 Inter.V2V4 Inter.V3V4
1 1 9 25 18 9 25 18 225 162 450
2 2 5 20 10 10 40 20 100 50 200
3 3 4 30 12 12 90 36 120 48 360
4 4 4 34 16 16 136 64 136 64 544
I think this question should be complemented with the poly/polym
function, which goes futher: it generates not only interactions between the variables, but its power until the selected degree. And orthogonal iteractions, which may be very usefull.
The directly solution to the asked problem would be:
> polym(x$V1, x$V2, x$V3, x$V4, degree = 2, raw = T)
1.0.0.0 2.0.0.0 0.1.0.0 1.1.0.0 0.2.0.0 0.0.1.0 1.0.1.0 0.1.1.0 0.0.2.0 0.0.0.1 1.0.0.1 0.1.0.1 0.0.1.1 0.0.0.2
[1,] 1 1 9 9 81 25 25 225 625 18 18 162 450 324
[2,] 2 4 5 10 25 20 40 100 400 10 20 50 200 100
[3,] 3 9 4 12 16 30 90 120 900 12 36 48 360 144
[4,] 4 16 4 16 16 34 136 136 1156 16 64 64 544 256
attr(,"degree")
[1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
The columns 4, 7, 8, 11, 12, 13 has the requested in the question. Other columns have other kinds of interactions. If you would like to get orthogonal interactions, just set raw = FALSE
.
精彩评论