How to remove duplicated rows by a column in an R matrix
I am trying to remove duplicated rows by one column (e.g the 1st column) in an R matrix. How can I extract the unique set by one column from a matrix? I've used
x_1 <- x[unique(x[,1]),]
While th开发者_如何学Ce size is correct, all of the values are NA
. So instead, I tried
x_1 <- x[-duplicated(x[,1]),]
But the dimensions were incorrect.
I think you're confused about how subsetting works in R. unique(x[,1])
will return the set of unique values in the first column. If you then try to subset using those values R thinks you're referring to rows of the matrix. So you're likely getting NAs because the values refer to rows that don't exist in the matrix.
Your other attempt runs afoul of the fact that duplicated
returns a boolean vector, not a vector of indices. So putting a minus sign in front of it converts it to a vector of 0's and -1's, which again R interprets as trying to refer to rows.
Try replacing the '-' with a '!' in front of duplicated
, which is the boolean negation operator. Something like this:
m <- matrix(runif(100),10,10)
m[c(2,5,9),1] <- 1
m[!duplicated(m[,1]),]
As you need the indeces of the unique rows, use duplicated
as you tried. The problem was using -
instead of !
, so try:
x[!duplicated(x[,1]),]
精彩评论