Select rows of a matrix that meet a condition
In R with a matrix:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
I want to extract the submatrix whose rows have column three = 11. That is:
one two three four
[1,] 1 6 11 16
[3,] 3 8 11 18
[4,] 4 9 11 19
I want to 开发者_运维百科do this without looping. I am new to R so this is probably very obvious but the documentation is often somewhat terse.
This is easier to do if you convert your matrix to a data frame using as.data.frame(). In that case the previous answers (using subset or m$three) will work, otherwise they will not.
To perform the operation on a matrix, you can define a column by name:
m[m[, "three"] == 11,]
Or by number:
m[m[,3] == 11,]
Note that if only one row matches, the result is an integer vector, not a matrix.
I will choose a simple approach using the dplyr package.
If the dataframe is data.
library(dplyr)
result <- filter(data, three == 11)
m <- matrix(1:20, ncol = 4)
colnames(m) <- letters[1:4]
The following command will select the first row of the matrix above.
subset(m, m[,4] == 16)
And this will select the last three.
subset(m, m[,4] > 17)
The result will be a matrix in both cases. If you want to use column names to select columns then you would be best off converting it to a dataframe with
mf <- data.frame(m)
Then you can select with
mf[ mf$a == 16, ]
Or, you could use the subset command.
Subset is a very slow function , and I personally find it useless.
I assume you have a data.frame, array, matrix called Mat
with A
, B
, C
as column names; then all you need to do is:
In the case of one condition on one column, lets say column A
Mat[which(Mat[,'A'] == 10), ]
In the case of multiple conditions on different column, you can create a dummy variable. Suppose the conditions are A = 10
, B = 5
, and C > 2
, then we have:
aux = which(Mat[,'A'] == 10)
aux = aux[which(Mat[aux,'B'] == 5)]
aux = aux[which(Mat[aux,'C'] > 2)]
Mat[aux, ]
By testing the speed advantage with system.time
, the which
method is 10x faster than the subset
method.
If your matrix is called m
, just use :
R> m[m$three == 11, ]
If the dataset is called data, then all the rows meeting a condition where value of column 'pm2.5' > 300 can be received by -
data[data['pm2.5'] >300,]
精彩评论