R selecting duplicate rows
Okay, I'm fairly new to R and I've tried to search the documentation for what I need to do but here is the problem.
I have a data.frame called heeds.data in the following form (some columns omitted for simplicity) eval.num, eval.count, ... fitness, fitness.mean, green.h.0, green.v.0, offset.0, green.h.1, green.v.1,...green.h.7, green.v.7, offset.7...
And I have selected a row meeting the following criteria:
best.fitness <- min(heeds.data$fitness.mean[heeds.data$eval.count >= 10])
best.row <- heeds.data[heeds.data$fitness.mean == best.fitness]
Now, what I want are all of the other rows with that have columns green.h.0 to offset.7 (a contiguous section of columns) equal to the best.row
I was thinking this might work
heeds.best <- heeds.data$fitness[
heeds.data$green.h.0 == best.row$green.h.0 & ...
]
But with 24 columns it seems like a stupid method. Looking for something a bit simpler with less manual typing.
Here is a short data sample to show what I want
eval.num, eval.count, fitness, fitness.mean, green.h.0, green.v.0, offset.0
1 1 1500 1500 100 120 40
2 2 1000 1250 100 120 40
3 3 1250 1250 100 120 40
4 4 1000 1187.5 100 1开发者_JAVA百科20 40
5 1 2000 2000 200 100 40
6 1 3000 3000 150 90 10
7 1 2000 2000 90 90 100
8 2 1800 1900 90 90 100
Should select the "best" as row 4 Then I want to grab the results as follows
eval.num, eval.count, fitness, fitness.mean, green.h.0, green.v.0, offset.0
1 1 1500 1500 100 120 40
2 2 1000 1250 100 120 40
3 3 1250 1250 100 120 40
4 4 1000 1187.5 100 120 40
Data isn't actually sorted and there are many more columns but that is the concept
Thanks!
Your question is essentially just a complicated indexing question. I have a solution here though there may be simpler ones. I loaded your examples data into DF
:
First, this gets us the best row index (easy using which.min()
) :
R> bind <- which.min(DF[,"fitness.mean"]) # index of best row
Next, we apply()
a row-wise comparison (over the subset of columns we care about, here index simply by position 5 to 7).
We use a comparison function cmpfun
to compare the current row r
to the best row (indexed by bind
) and use all()
to get rows where all elements correspond. [ We need drop=FALSE
here to make it comparable on both sides, else as.numeric()
helps. ]
R> cmpfun <- function(r) all(r == DF[bind,5:7,drop=FALSE]) # compare to row bind
This we simply apply
this row-wise:
R> brows <- apply(DF[,5:7], 1, cmpfun)
And these are the rows we wanted:
R> DF[brows, ]
eval.num eval.count fitness fitness.mean green.h.0 green.v.0 offset.0
1 1 1 1500 1500 100 120 40
2 2 2 1000 1250 100 120 40
3 3 3 1250 1250 100 120 40
4 4 4 1000 1188 100 120 40
R>
It did not matter that we use three columns for comparison -- all that mattered is that we had an indexing expression (here 5:7
) for the columns we wanted.
精彩评论