how to select rows from a dataframe1 in R where dataframe$1column is found somewhere in dataframe2$column
I need to create a new dataframe from rows from dataframe1, such that the value of dataframe1$column is a value found in dataframe2$colum
the dataframes are:
y <- "name,number,lunch
joe,2,peaches
steve,5,hotdog
jon,7,clamroll
nick,11,sloppyJoe"
x <- "number,office
1,1b
2,1a
3,2s
4,4d
5,f4
6,f4
7,h3
8,g3
9,j7
10,d3
11,jk"
df1 <- read.csv(textConnection(df1), header=TRUE, sep=",", stringsAsFactors=FALSE)
df2 <- read.csv(textConnection(df2), header=TRUE, sep=",", stringsAsFactors=FAL开发者_如何学编程SE)
I have tried:
df3 <- df1[which(df1$number == df2$number), ]
to no avail. How do I properly do this in R? I could write a perl script, but I have about 100 of these sets and don't want to create more temp files.
again, the %in%
trick :
> df1[df1$number %in% df2$number,]
number office
2 2 1a
5 5 f4
7 7 h3
11 11 jk
For what it's worth, you can easily just do a merge if you want to combine them. In this case I'd say that's the cleanest solution : gives you every office of the occuring employees, and matches them :
> merge(df1,df2)
number office name lunch
1 2 1a joe peaches
2 5 f4 steve hotdog
3 7 h3 jon clamroll
4 11 jk nick sloppyJoe
Check the help files of merge for more options, you can do a whole lot with it.
Joris' answer is spot on. The merge()
command can also be useful for this type of stuff. If you are familiar with SQL joins, you can draw parallels between most of the options in merge()
and the different join operations.
#Inner join
> merge(df1,df2)
number office name lunch
1 2 1a joe peaches
2 5 f4 steve hotdog
3 7 h3 jon clamroll
4 11 jk nick sloppyJoe
#Right join:
> merge(df1,df2, all.x = TRUE)
number office name lunch
1 1 1b <NA> <NA>
2 2 1a joe peaches
3 3 2s <NA> <NA>
4 4 4d <NA> <NA>
5 5 f4 steve hotdog
6 6 f4 <NA> <NA>
7 7 h3 jon clamroll
8 8 g3 <NA> <NA>
9 9 j7 <NA> <NA>
10 10 d3 <NA> <NA>
11 11 jk nick sloppyJoe
精彩评论