开发者

Check whether a row with values belongs to a data frame in R [duplicate]

This question already has answers here: 开发者_运维问答 Closed 11 years ago.

Possible Duplicate:

Existing function for seeing if a row exists in a data frame?

Suppose I have the following data frame in R.

df = data.frame('a'=c(1:3), 'b'=c(4:6))

This data frame contains three rows: (1,4), (2,5) and (3,6). Suppose I did not know which rows df contains and wanted to check whether a row (1,4) belongs to it, how can I check that?

My actual case involves comparison of 27 parameter values. Is there a solution in which I can do this without typing each and every parameter name? Thanks!

The reason I want to do this is that I have an R dataset called masterdata which contains simulation data. I want to update this data set with new data that is obtained as I make additional simulation runs with different parameter combinations. It is possible, however, that I may forget that I have run the simulation for a certain parameter combination and may run it again, in which case, the masterdata will be expanded with duplicate values. I can later go and remove these duplicate values, but I would not want the whole process of updating the data set to go through if the values are duplicate. For this I need to check if the data from a simulation run is already present in the masterdata. I can do this if I know how to check whether a given row belongs to the masterdata.

Thanks.


There may be more efficient ways, but I think

tail(duplicated(rbind(masterdata,newvals)),1)

will do it: in other words, attach the new row to the end of the data frame and see whether it is duplicated or not.


If you want to compare only two columns in the data.frame, then this does a trick:

> which(df$a+df$b*1i == 1+4i)
[1] 1

This may or may not be faster than other vectorized solution.


Quite a few ways to do this. You can use ifelse() which is a vectorized solution to return a boolean value for each row of your dataframe if it meets your conditions.

> with(df, ifelse(a == 1 & b == 4, 1, 0))
[1] 1 0 0

Since you are probably only interested in knowing whether your parameter combination has been run at all, you can wrap sum() around the previous command:

> sum(with(df, ifelse(a == 1 & b == 4, 1, 0)))
[1] 1

Another alternative is to use nrow() and subset(). We'll again use the & operator for our testing:

> nrow(subset(df, a == 1 & b == 4))
[1] 1


You don't need any more than a single unique call:

Test<-data.frame(a=c(1,2,2,2,3),b=c(1,2,2,3,3),c=(1,2,2,2,3))
Test
unique(Test) #Same with duplicated rows removed
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜