Reshape data frame to convert factors into columns in R
I have a data frame where one particular column has a set of specific values (let's say, 1, 2, ..., 23). What I would like to do is to convert from this layout to the one, where the frame would h开发者_Python百科ave extra 23 (in this case) columns, each one representing one of the factor values. The data in these columns would be booleans indicating whether a particular row had a given factor value... To show a specific example:
Source frame:
ID DATE SECTOR
123 2008-01-01 1
456 2008-01-01 3
789 2008-01-02 5
... <more records with SECTOR values from 1 to 5>
Desired format:
ID DATE SECTOR.1 SECTOR.2 SECTOR.3 SECTOR.4 SECTOR.5
123 2008-01-01 T F F F F
456 2008-01-01 F F T F F
789 2008-01-02 F F F F T
I have no problem doing it in a loop but I hoped there would be a better way. So far reshape()
didn't yield the desired result. Help would be much appreciated.
I would try to bind another column called "value" and set value = TRUE
.
df <- data.frame(cbind(1:10, 2:11, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
df <- data.frame(df, value=TRUE)
Then do a reshape:
reshape(df, idvar=c("ID","DATE"), timevar="SECTOR", direction="wide")
The problem with using the reshape
function is that the default for missing values is NA (in which case you will have to iterate and replace them with FALSE).
Otherwise you can use cast
out of the reshape
package (see this question for an example), and set the default to FALSE
.
df.wide <- cast(df, ID + DATE ~ SECTOR, fill=FALSE)
> df.wide
ID DATE 1 2 3
1 1 2 TRUE FALSE FALSE
2 2 3 FALSE TRUE FALSE
3 3 4 FALSE FALSE TRUE
4 4 5 TRUE FALSE FALSE
5 5 6 FALSE TRUE FALSE
6 6 7 FALSE FALSE TRUE
7 7 8 TRUE FALSE FALSE
8 8 9 FALSE TRUE FALSE
9 9 10 FALSE FALSE TRUE
10 10 11 TRUE FALSE FALSE
Here's another approach using xtabs
which may or may not be faster (if someone would try and let me know):
df <- data.frame(cbind(1:12, 2:13, 1:3))
colnames(df) <- c("ID","DATE","SECTOR")
foo <- xtabs(~ paste(ID, DATE) + SECTOR, df)
cbind(t(matrix(as.numeric(unlist(strsplit(rownames(foo), " "))), nrow=2)), foo)
精彩评论