
Filtering a data frame

I have read in a csv file in matrix form (having m rows and n columns). I want to filter the matrix by conducting a filter in verbal form:

Select all values from column x where the values of an another column in this row is equal to "blabla".

It is like a select statement in database where I say I am interested in a subset of the matrix where these constraints need to be satisfied.

How can I do it in r? I have the data as dataframe and can access it by the headers. data["column_values" = "15"] does not give me back the rows where the column n开发者_StackOverflow社区amed column_values have values 15 only.


You said you just wanted the column x values where column_values was 15, right?

subset(dat, column_values==15, select=x)

I think this may come as a dataframe so it's possble you may need to unlist() it and maybe even "unfactor" it.

> dat
  Subject Product
1       1   ProdA
2       1   ProdB
3       1   ProdC
4       2   ProdB
5       2   ProdC
6       2   ProdD
7       3   ProdA
8       3   ProdB
> subset(dat, Subject==2, Product)
4   ProdB
5   ProdC
6   ProdD
> unlist( subset(dat, Subject==2, Product) )
Product1 Product2 Product3 
   ProdB    ProdC    ProdD 
Levels: ProdA ProdB ProdC ProdD
> as.character( unlist( subset(dat, Subject==2, Product) ) )
[1] "ProdB" "ProdC" "ProdD"

If you want all of the columns you can drop the third argument (the select= argument):

subset(dat, Subject==2 )

  Subject Product
4       2   ProdB
5       2   ProdC
6       2   ProdD

Assuming that dat is the data frame in question, col is the name of the column and "value" is the value that you want, you can do


That fetches all of the rows of dat for which dat$col=="value", and all of the columns.

First, note that a matrix and a data.frame are different things in R. I imagine you have a data.frame (as that is what is returned by read.csv()). data.frame's have named columns (if you don't give them ones, generic ones are created for you).

You can subset a data.frame by indicating both what rows you want and/or what columns you want. The easiest way to specify which rows is with a logical vector, often built out of comparisons using specific columns of the data.frame. For example data[["column values"]] == "15" would make a logical vector which is TRUE if the corresponding entry in the column column values is the string "15" (since it is in quotes, it is a string, not a number). You can make as complicated a selection criteria as you like (combining logical vectors with & and |) to specify the rows you want. This vector becomes the first argument in the indexing.

A list of column names or numbers can be the second argument. If either argument is missing, all rows (or columns) are assumed.

Putting this all together, you get examples like

data[data[["column values"]] == "15", ]

or using an actual data set (mtcars)

mtcars[mtcars$am == 1, ]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg"]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg", drop=FALSE]
mtcars[mtcars$hp > 100, c("mpg", "carb")]

Take a look at what each of the conditionals (first arguments, e.g. mtcars$am == 1 & mtcars$hp > 100) return to get a better sense of how indexing works.





验证码 换一张
取 消

