Reshape data frame from wide to panel with multiple variables and some time invariant
This is a basic problem in data analysis which Stata deals with in one step.
Create a wide data frame with time invariant data (x0) and time varying data for years 2000 and 2005 (x1,x2):
d1 <- data.frame(subject = c("id1", "id2"),
x0 = c("male", "female"),
x1_2000 = 1:2,
x1_2005 = 5:6,
x2_2000 = 1:2,
x2_2005 = 5:6
)
s.t.
subject x0 x1_2000 x1_2005 x2_2000 x2_2005
1 id1 mal开发者_如何学编程e 1 5 1 5
2 id2 female 2 6 2 6
I want to shape it like a panel so data looks like this:
subject x0 time x1 x2
1 id1 male 2000 1 1
2 id2 female 2000 2 2
3 id1 male 2005 5 5
4 id2 female 2005 6 6
I can do this with reshape
s.t.
d2 <-reshape(d1,
idvar="subject",
varying=list(c("x1_2000","x1_2005"),
c("x2_2000","x2_2005")),
v.names=c("x1","x2"),
times = c(2000,2005),
direction = "long",
sep= "_")
My main concern is that when you have dozens of variables the above command gets very long. In stata
one would simply type:
reshape long x1 x2, i(subject) j(year)
Is there such a simple solution in R?
reshape
can guess many of its arguments. In this case it's sufficient to specify the following. No packages are used.
reshape(d1, dir = "long", varying = 3:6, sep = "_")
giving:
subject x0 time x1 x2 id
1.2000 id1 male 2000 1 1 1
2.2000 id2 female 2000 2 2 2
1.2005 id1 male 2005 5 5 1
2.2005 id2 female 2005 6 6 2
here is a brief example using reshape2 package:
library(reshape2)
library(stringr)
# it is always useful to start with melt
d2 <- melt(d1, id=c("subject", "x0"))
# redefine the time and x1, x2, ... separately
d2 <- transform(d2, time = str_replace(variable, "^.*_", ""),
variable = str_replace(variable, "_.*$", ""))
# finally, cast as you want
d3 <- dcast(d2, subject+x0+time~variable)
now you don't need even specifying x1 and x2.
This code works if variables increase:
> d1 <- data.frame(subject = c("id1", "id2"), x0 = c("male", "female"),
+ x1_2000 = 1:2,
+ x1_2005 = 5:6,
+ x2_2000 = 1:2,
+ x2_2005 = 5:6,
+ x3_2000 = 1:2,
+ x3_2005 = 5:6,
+ x4_2000 = 1:2,
+ x4_2005 = 5:6
+ )
>
> d2 <- melt(d1, id=c("subject", "x0"))
> d2 <- transform(d2, time = str_replace(variable, "^.*_", ""),
+ variable = str_replace(variable, "_.*$", ""))
>
> d3 <- dcast(d2, subject+x0+time~variable)
>
> d3
subject x0 time x1 x2 x3 x4
1 id1 male 2000 1 1 1 1
2 id1 male 2005 5 5 5 5
3 id2 female 2000 2 2 2 2
4 id2 female 2005 6 6 6 6
精彩评论