R: Stacking Multiple Punch Question Data

2023-02-11 05:19 问答作者：

Suppose we have 2 questions in a survey, one is about how likely an individual is to recommend a company (let's say there's 2 companies for simplicity).

So, I have one data.frame with 2 columns for this question:

df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Company1","Company2")

And, suppose we have another question that asks respondents to checkmark a box beside an attribute that they believe "fits" with the company.

So, I have another data.frame with 4 columns for this question:

df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))

colnames(df.attribute) <- c(
"Attribute1.Company1", 
"Attribute2.Company1", 
"Attribute1.Company2", 
"Attribute2.Company2")

Now, what I would like to be able to do is review how Attributes 1 and 2 are related to the scale in the likelyhood to recommend question, for all companies (company independent). Just to get an idea of what inertia lies between those people that are highly likely to recommend and attribute 1 for example.

So, I start off by binding the two questions together:

df <- cbind(df.recommend, df.attribute)

My problem is trying to figure out how to stack these data such that the columns look something like:

df.stacked <- data.frame(c(df$Company1,df$Company2),
c(df$Attribute1.Company1,df$Attribute1.Company2), 
c(df$Attribute2.Company1,df$Attribute2.Company2))
colnames(df.stacked) <- c("Likelihood","Attribute1","Attribute2")

This example is simplified to a large degree. In my actual problem, I have 34 companies and 24 attributes.

Could you think of a way to stack them effectively, without having to type out all the c() statements?

Note: The column pattern for likelyhood is Co1,Co2,Co开发者_运维技巧3,Co4... and the pattern for the attributes is At1.Co1,At2.Co1,At3.Co1 ... At1.Co34,At2.Co34...

For this type of problem, Hadley's reshape package is the perfect tool. I combine it with a few stringr and plyr statements (also packages written by Hadley).

Here is what I believe to be a complete solution in about a dozen lines of code.

First, create some data

library(reshape2) # EDIT 1: reshape2 is faster
library(stringr)
library(plyr)

# Create data frame
# Important: note the addition of a respondent id column

df_comp <- data.frame(
        RespID = 1:10,
        Company1 = rep(1:5, 2),
        Company2 = rep(1:5, 2)
)

df_attr <- data.frame(
        RespID = 1:10,
        Attribute1.Company1 = rep(0:1,5),
        Attribute2.Company1 = rep(1:0,5),
        Attribute1.Company2 = rep(0:1,5),
        Attribute2.Company2 = rep(1:0,5)
)

Now start the data manipulation:

# Use melt to convert data from wide to tall

melt_comp <- melt(df_comp, id.vars="RespID")
melt_comp <- rename(melt_comp, c(variable="comp", value="likelihood"))
melt_attr <- melt(df_attr, id.vars="RespID")

# Use str_split to split attribute variables into attribute and company
# "." period needs to be escaped

# EDIT 2:  reshape::colsplit is simpler than str_split
split <- colsplit(melt_attr$variable, "\\.", names=c("attr", "comp")) 
melt_attr <- data.frame(melt_attr, split)
melt_attr$variable <- NULL

# Use cast to convert from tall to somewhat tall

cast_attr <- cast(melt_attr, RespID + comp ~ attr, mean)


# Combine data frames using join() in package plyr

df <- join(melt_comp, cast_attr)
head(df)

And the output:

  RespID     comp likelihood Attribute1 Attribute2
1      1 Company1          1          0          1
2      2 Company1          2          1          0
3      3 Company1          3          0          1
4      4 Company1          4          1          0
5      5 Company1          5          0          1
6      6 Company1          1          1          0

Something I quickly cooked up. Doesn't look the best and uses a for-loop but that shouldn't be a problem with only 24 values

df.recommend <- data.frame(rep(1:5,20),rep(1:5,20))
colnames(df.recommend) <- c("Co1","Co2")

df.attribute <- data.frame(rep(0:1,50),rep(1:0,50),rep(0:1,50),rep(1:0,50))

colnames(df.attribute) <- c(
"At1.Co1", 
"At2.Co1", 
"At1.Co2", 
"At2.Co2") 


df.stacked <- data.frame(
    likelihood <- unlist(df.recommend)
    )
str <- strsplit(names(df.attribute),split="\\.")
atts <- unique(sapply(str,function(x)x[1]))

for (i in 1:length(atts)) 
{
    df.stacked[,i+1] <- unlist(df.attribute[sapply(str,function(x)x[1]==atts[i])])
}

names(df.stacked) <- c("likelihood",paste("attribute",1:length(atts),sep=""))

EDIT: It assumes that companies are in the same order for each attribute

继续阅读：r reshape

R: Stacking Multiple Punch Question Data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？