Existing function for seeing if a row exists in a data frame?
Is the开发者_Python百科re an existing function for determining whether a row exists within a data frame? I suppose could do an apply/identical, but it seems like I'm missing something.
For example:
given such a data frame:
a b
1 1 cat
2 2 dog
Is there an existing function which will allow me to test whether the row (1, cat)
exists in the data frame?
Thanks, Zach
Try match_df
from plyr (using Marek's sample data):
library(plyr)
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat")
match_df(X, row_to_find)
For data from @Marek answer.
nrow(merge(row_to_find,X))>0 # TRUE if exists
Taking your example:
X <- data.frame(a=1:2, b=c("cat","dog"))
row_to_find <- data.frame(a=1, b="cat") # it has to be data.frame (not a vector) to hold different types
Then
duplicated(rbind(X, row_to_find))[nrow(X)+1]
gives you answer.
I suggest Ben Bolker's solution since nrow(merge(row_to_find,X))>0
solution doesn't work for me (always give TRUE) :
tail(duplicated(rbind(X,row_to_find)),1)>0
For fans of dplyr
and the tidyverse
, you can use dplyr:anti_join()
. According to its documentation, dplyr::anti_join(x, y)
"returns all rows from x
where there are not matching values in y
, keeping just columns from x
." Hence for dplyr::anti_join(row, df)
the result has zero rows, then row
was indeed in df
, if it has one row, then row
was not in df
.
library(dplyr)
df <- tribble(~a, ~b,
1, "cat",
2, "dog")
#> # A tibble: 2 x 2
#> a b
#> <dbl> <chr>
#> 1 1.00 cat
#> 2 2.00 dog
row <- tibble(a = 1, b = "cat")
#> # A tibble: 1 x 2
#> a b
#> <dbl> <chr>
#> 1 1.00 cat
nrow(anti_join(row, df)) == 0 # row is in df so should be TRUE
#> Joining, by = c("a", "b")
#> [1] TRUE
row <- tibble(a = 3, b = "horse")
#> # A tibble: 1 x 2
#> a b
#> <dbl> <chr>
#> 1 3.00 horse
nrow(anti_join(row, df)) == 0 # row is not in df so should be FALSE
#> Joining, by = c("a", "b")
#> [1] FALSE
For vector, y, with same number of elements as columns in dataframe, dfrm:
apply(dfrm, 1, function(x) all( x == y) )
Should return a vector of TRUE and FALSE which could in turn be used as an index in [,]
dfrm[ apply(dfrm, 1, function(x) all( x == y) ) , ]
The identical
function is probably too stringent, since it will check attributes as well.
> y=c(1,2,3)
> x = data.frame(a=1:10, b=2:11, c=3:12)
> identical(x[1,] , y)
[1] FALSE
Another approach, using base R:
df <- data.frame(a = c(1, 2), b = c("cat", "dog"))
any(df$a == 1 & df$b == "cat")
#> [1] TRUE
精彩评论