loop to create a new variable based on other cases in R (very basic)
I have a dataframe with three variables: ID
, group
, and nominated_ID
.
I want to know the group开发者_StackOverflow社区
that nominated_ID
belongs in.
I'm imagining that for each case, we take nominated_ID
, find the case where it is equal to ID
, and then set the nominated_Group
variable in the original case equal to the group
variable in the matched case. (If there is no match, set it to NA)
I wouldn't be surprised if this can be done without a loop, so I'm open-minded about the solution. Thanks so much for your help. Know that I did try to look for similar questions before posting.
You can achieve this in one step without the use of cbind
by directly allocating results to a column in your data.frame:
df$nominated_group <- with(df, group[match(nominated_ID, ID)])
df
ID group nominated_ID nominated_group
1 9 Odd 9 Odd
2 5 Odd 8 <NA>
3 2 Even 4 Even
4 4 Even 9 Odd
5 3 Odd 2 Even
I used with
as a convenient way of referring to the columns of df without having to repeatedly write df$
.
The following seems to work; there may be better ways
> df <- data.frame(ID = c(9, 5, 2, 4, 3),
+ group = c("Odd", "Odd", "Even", "Even", "Odd"),
+ nominated_ID = c(9, 8, 4, 9, 2) )
> df
ID group nominated_ID
1 9 Odd 9
2 5 Odd 8
3 2 Even 4
4 4 Even 9
5 3 Odd 2
> nominated_Group <- df[match(df$nominated_ID, df$ID), ]$group
> newDF <- cbind(df, nominated_Group)
> newDF
ID group nominated_ID nominated_Group
1 9 Odd 9 Odd
2 5 Odd 8 <NA>
3 2 Even 4 Even
4 4 Even 9 Odd
5 3 Odd 2 Even
You can do this in a syntactically compact way using transform
, match
and array indexing. Using @Henry's data-frame:
df <- transform( df, nominated_group = group[match(nominated_ID, ID)])
> df
ID group nominated_ID nominated_group
1 9 Odd 9 Odd
2 5 Odd 8 <NA>
3 2 Even 4 Even
4 4 Even 9 Odd
5 3 Odd 2 Even
Probably not the most "intuitive' way, but merging df
against df
also works if you use nominated_ID as the merge index for the first copy and ID as the by index for the second and keep all rows. You need to drop the second nominated_ID
column and rearrange the order to get things to match the answers above:
merge(df,df, by.x=3, by.y=1, all.x=TRUE)[order(df$nominated_ID), c(2,3, 1, 4)]
ID group.x nominated_ID group.y
5 4 Even 9 Odd
3 5 Odd 8 <NA>
2 2 Even 4 Even
1 3 Odd 2 Even
4 9 Odd 9 Odd
精彩评论