assigning a factor to a data frame

2023-03-13 02:02 问答作者：

I want to add a column to a data frame which will encode the specific levels of a factor. e.g.

subject  rate
1          12
1          10 
1          13
4          4
4          6
4          12
2          9
2          2
2          5
6          17
6          开发者_开发技巧10
6          1

in the above data frame I wish add a third column called "treatment" where subjects are assigned to one of two levels "a" or "b". e.g. below

subject  rate  treatment
1          12      a
1          10      a
1          13      a
4          4       b
4          6       b
4          12      b
2          9       b
2          2       b
2          5       b 
6          17      a
6          10      a
6          1       a

Thanks in advance for any help.

Here's another approach using the plyr package:

library(plyr)

#Make some fake data
set.seed(1)
dat <- data.frame(subject = rep(c(1,4,2,6), each = 3), rate = sample(1:20, 12, TRUE))

set.seed(1)
#Assign treatment based on the subject ID. This does not ensure that you will get
#at least one subject in each treatment group.
ddply(dat, "subject", transform, treatment = sample(letters[1:2], TRUE))

EDIT - to address your comment

Given that you want to specify which subject gets assigned to which treatment, Gavin's suggestion of merge is spot on. I would first make a new data.frame that contains one record for each unique subject, assign their treatment, and then merge them together:

treatments <- data.frame(subject = unique(dat$subject), treats = c("a", "b", "b", "a"))
merge(dat, treatments)

Note that the order of unique(dat$subject) is 1,4,2,6 which corresponds to the order of the values in the original data.frame. If your real problem contains more than four subjects, you may want to consider a more automated way of assigning treatments groups. One approach I've used in the past is to assign a random number to each respondent, and then assign groups based on a given threshold of that random number. It is essentially the same as the approach above, but can ensure that you get equal numbers in each group. For example:

dat <- ddply(dat, "subject", transform, treatment = runif(1))
dat <- within(dat, treatment <- ifelse(treatment < quantile(treatment, 0.5),"a", "b"))

If you want to assign treatments at random, this will do it:

## subject IDs
subj <- with(dat, unique(subject))

## how many treatment levels?
ntreat <- 2

## sample an identifier for the treaments
set.seed(47)
treats <- sample(letters[seq_len(ntreat)], length(subj), replace = TRUE)

## stick this into a subject/treatment data frame
Treat <- data.frame(cbind(subject = subj, treatment = treats))

This gives:

R> Treat
  subject treatment
1       1         b
2       4         a
3       2         b
4       6         b

Edit:

If the treatments have been pre-assigned, then just create the Treat data frame by hand;

Treat <- data.frame(subject = c(1,4,2,6), treatment = c("a","b","b","a"))

If you have lots of these to do you can use functions like seq() and rep(), plus the inbuilt letters constant to speed up the "data entry".

End edit

We can now use this data frame in a merge with the original data to insert the treatment for the respective subject, using merge():

R> merge(dat, Treat)
   subject rate treatment
1        1   12         b
2        1   10         b
3        1   13         b
4        2    9         b
5        2    2         b
6        2    5         b
7        4    4         a
8        4    6         a
9        4   12         a
10       6   17         b
11       6   10         b
12       6    1         b

I will assume you have some key how to transform this data, like for instance 1,6=>a, 4,2=>b. Then the ifelse and %in% mix should do the job:

df$treatment<-factor(ifelse(df$subject%in%c('1','6'),'a','b'))

The more general option is to copy this factor and alter its levels, but the details are dependent on how do you have your dictionary stored. Simple example:

x<-df$subject; levels(x)<-c('a','b','b','a')
x->df$treatment

(In both examples I assume that subject is a factor)

An another approach may be writing a special function to decide the treatment with respect to subject and apply the function on subject to create a new treatment column.

Here is the code:

data <- data.frame(subject = as.numeric(rep(c(1,2,4,6)), each = 4), rate = sample(1:20, 16, TRUE))

cat = function(x){
  if (x == 1 || x == 4){return('a')}
  else if (x == 2 || x == 6 ) {return('b')}
  else { NaN}
}

data$treat = lapply(data$subject, cat)

head(data)

Output:

> head(data)
  subject rate treat
1       1   15     a
2       2   20     b
3       4    8     a
4       6   16     b
5       1   19     a
6       2    5     b

assigning a factor to a data frame

Output:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Output:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？