R: Stacking Multiple Punch Question Data
Suppose we have 2 questions in a survey, one is about how likely an individual is to recommend a company (let's say there's 2 companies for simplicity).
So, I have one data.frame with 2 columns for this question:
df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Company1","Company2")
And, suppose we have another question that asks respondents to checkmark a box beside an attribute that they believe "fits" with the company.
So, I have another data.frame with 4 columns for this question:
df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))
colnames(df.attribute) <- c(
"Attribute1.Company1",
"Attribute2.Company1",
"Attribute1.Company2",
"Attribute2.Company2")
Now, what I would like to be able to do is review how Attributes 1 and 2 are related to the scale in the likelyhood to recommend question, for all companies (company independent). Just to get an idea of what inertia lies between those people that are highly likely to recommend and attribute 1 for example.
So, I start off by binding the two questions together:
df <- cbind(df.recommend, df.attribute)
My problem is trying to figure out how to stack these data such that the columns look something like:
df.stacked <- data.frame(c(df$Company1,df$Company2),
c(df$Attribute1.Company1,df$Attribute1.Company2),
c(df$Attribute2.Company1,df$Attribute2.Company2))
colnames(df.stacked) <- c("Likelihood","Attribute1","Attribute2")
This example is simplified to a large degree. In my actual problem, I have 34 companies and 24 attributes.
Could you think of a way to stack them effectively, without having to type out all the c() statements?
Note: The column pattern for likelyhood is Co1,Co2,Co开发者_运维技巧3,Co4... and the pattern for the attributes is At1.Co1,At2.Co1,At3.Co1 ... At1.Co34,At2.Co34...
For this type of problem, Hadley's reshape package is the perfect tool. I combine it with a few stringr and plyr statements (also packages written by Hadley).
Here is what I believe to be a complete solution in about a dozen lines of code.
First, create some data
library(reshape2) # EDIT 1: reshape2 is faster
library(stringr)
library(plyr)
# Create data frame
# Important: note the addition of a respondent id column
df_comp <- data.frame(
RespID = 1:10,
Company1 = rep(1:5, 2),
Company2 = rep(1:5, 2)
)
df_attr <- data.frame(
RespID = 1:10,
Attribute1.Company1 = rep(0:1,5),
Attribute2.Company1 = rep(1:0,5),
Attribute1.Company2 = rep(0:1,5),
Attribute2.Company2 = rep(1:0,5)
)
Now start the data manipulation:
# Use melt to convert data from wide to tall
melt_comp <- melt(df_comp, id.vars="RespID")
melt_comp <- rename(melt_comp, c(variable="comp", value="likelihood"))
melt_attr <- melt(df_attr, id.vars="RespID")
# Use str_split to split attribute variables into attribute and company
# "." period needs to be escaped
# EDIT 2: reshape::colsplit is simpler than str_split
split <- colsplit(melt_attr$variable, "\\.", names=c("attr", "comp"))
melt_attr <- data.frame(melt_attr, split)
melt_attr$variable <- NULL
# Use cast to convert from tall to somewhat tall
cast_attr <- cast(melt_attr, RespID + comp ~ attr, mean)
# Combine data frames using join() in package plyr
df <- join(melt_comp, cast_attr)
head(df)
And the output:
RespID comp likelihood Attribute1 Attribute2
1 1 Company1 1 0 1
2 2 Company1 2 1 0
3 3 Company1 3 0 1
4 4 Company1 4 1 0
5 5 Company1 5 0 1
6 6 Company1 1 1 0
Something I quickly cooked up. Doesn't look the best and uses a for-loop but that shouldn't be a problem with only 24 values
df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Co1","Co2")
df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))
colnames(df.attribute) <- c(
"At1.Co1",
"At2.Co1",
"At1.Co2",
"At2.Co2")
df.stacked <- data.frame(
likelihood <- unlist(df.recommend)
)
str <- strsplit(names(df.attribute),split="\\.")
atts <- unique(sapply(str,function(x)x[1]))
for (i in 1:length(atts))
{
df.stacked[,i+1] <- unlist(df.attribute[sapply(str,function(x)x[1]==atts[i])])
}
names(df.stacked) <- c("likelihood",paste("attribute",1:length(atts),sep=""))
EDIT: It assumes that companies are in the same order for each attribute
精彩评论