Cartesian product data frame
I have three or more independent variables represented as R vectors, like so:
A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(0.1,0.5)
and I want to take the Cartesian product of all of them and put the result into a data frame, like this:
A B C
1 x 0.1
1 x 0.5
1 y 0.1
1 y 0.5
2 x 0.1
2 x 0.5
2 y 0.1
2 y 0.5
3 x 0.1
3 x 0.5
3 y 0.1
3 y 0.5
I can do this by manually writing out calls to rep
:
d <- data.frame(A = rep(A, times=length(B)*length(C)),
开发者_如何学JAVA B = rep(B, times=length(A), each=length(C)),
C = rep(C, each=length(A)*length(B))
but there must be a more elegant way to do it, yes? product
in itertools
does part of the job, but I can't find any way to absorb the output of an iterator and put it into a data frame. Any suggestions?
p.s. The next step in this calculation looks like
d$D <- f(d$A, d$B, d$C)
so if you know a way to do both steps at once, that would also be helpful.
You can use expand.grid(A, B, C)
EDIT: an alternative to using do.call
to achieve the second part, is the function mdply
from the package plyr
:
library(plyr)
d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)
To illustrate its usage using a trivial function 'paste', you can try
d = mdply(d, 'paste', sep = '+');
There's a function manipulating dataframe, which is helpful in this case.
It can produce various join(in SQL terminology), while Cartesian product is a special case.
You have to convert the varibles to data frames first, because it take data frame as parameters.
so something like this will do:
A.B=merge(data.frame(A=A), data.frame(B=B),by=NULL);
A.B.C=merge(A.B, data.frame(C=C),by=NULL);
The only thing to care about is that rows are not sorted as you depicted. You may sort them manually as you wish.
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
sort = TRUE, suffixes = c(".x",".y"),
incomparables = NULL, ...)
"If by or both by.x and by.y are of length 0 (a length zero vector or NULL), the result, r, is the Cartesian product of x and y"
see this url for detail: http://stat.ethz.ch/R-manual/R-patched/library/base/html/merge.html
With library tidyr
one can use tidyr::crossing
(order will be as in OP):
library(tidyr)
crossing(A,B,C)
# A tibble: 12 x 3
# A B C
# <dbl> <fct> <dbl>
# 1 1 x 0.1
# 2 1 x 0.5
# 3 1 y 0.1
# 4 1 y 0.5
# 5 2 x 0.1
# 6 2 x 0.5
# 7 2 y 0.1
# 8 2 y 0.5
# 9 3 x 0.1
# 10 3 x 0.5
# 11 3 y 0.1
# 12 3 y 0.5
The next step would be to use tidyverse
and especially the purrr::pmap*
family:
library(tidyverse)
crossing(A,B,C) %>% mutate(D = pmap_chr(.,paste,sep="_"))
# A tibble: 12 x 4
# A B C D
# <dbl> <fct> <dbl> <chr>
# 1 1 x 0.1 1_1_0.1
# 2 1 x 0.5 1_1_0.5
# 3 1 y 0.1 1_2_0.1
# 4 1 y 0.5 1_2_0.5
# 5 2 x 0.1 2_1_0.1
# 6 2 x 0.5 2_1_0.5
# 7 2 y 0.1 2_2_0.1
# 8 2 y 0.5 2_2_0.5
# 9 3 x 0.1 3_1_0.1
# 10 3 x 0.5 3_1_0.5
# 11 3 y 0.1 3_2_0.1
# 12 3 y 0.5 3_2_0.5
Consider using the wonderful data.table library for expressiveness and speed. It handles many plyr use-cases (relational group by), along with transform, subset and relational join using a fairly simple uniform syntax.
library(data.table)
d <- CJ(x=A, y=B, z=C) # Cross join
d[, w:=f(x,y,z)] # Mutates the data.table
or in one line
d <- CJ(x=A, y=B, z=C)[, w:=f(x,y,z)]
Here's a way to do both, using Ramnath's suggestion of expand.grid
:
f <- function(x,y,z) paste(x,y,z,sep="+")
d <- expand.grid(x=A, y=B, z=C)
d$D <- do.call(f, d)
Note that do.call
works on d
"as-is" because a data.frame
is a list
. But do.call
expects the column names of d
to match the argument names of f
.
Using cross join in sqldf
:
library(sqldf)
A <- data.frame(c1 = c(1,2,3))
B <- data.frame(c2 = factor(c('x','y')))
C <- data.frame(c3 = c(0.1,0.5))
result <- sqldf('SELECT * FROM (A CROSS JOIN B) CROSS JOIN C')
I can never remember that standard function expand.grid
. So here's another version.
crossproduct <- function(...,FUN='data.frame') {
args <- list(...)
n1 <- names(args)
n2 <- sapply(match.call()[1+1:length(args)], as.character)
nn <- if (is.null(n1)) n2 else ifelse(n1!='',n1,n2)
dims <- sapply(args,length)
dimtot <- prod(dims)
reps <- rev(cumprod(c(1,rev(dims))))[-1]
cols <- lapply(1:length(dims), function(j)
args[[j]][1+((1:dimtot-1) %/% reps[j]) %% dims[j]])
names(cols) <- nn
do.call(match.fun(FUN),cols)
}
A <- c(1,2,3)
B <- factor(c('x','y'))
C <- c(.1,.5)
crossproduct(A,B,C)
crossproduct(A,B,C, FUN=function(...) paste(...,sep='_'))
精彩评论