Filtering a data frame
I have read in a csv file in matrix form (having m rows and n columns). I want to filter the matrix by conducting a filter in verbal form:
Select all values from column x where the values of an another column in this row is equal to "blabla".
It is like a select statement in database where I say I am interested in a subset of the matrix where these constraints need to be satisfied.
How can I do it in r? I have the data as dataframe and can access it by the headers. data["column_values" = "15"]
does not give me back the rows where the column n开发者_StackOverflow社区amed column_values have values 15 only.
Thanks
You said you just wanted the column x values where column_values was 15, right?
subset(dat, column_values==15, select=x)
I think this may come as a dataframe so it's possble you may need to unlist() it and maybe even "unfactor" it.
> dat
Subject Product
1 1 ProdA
2 1 ProdB
3 1 ProdC
4 2 ProdB
5 2 ProdC
6 2 ProdD
7 3 ProdA
8 3 ProdB
> subset(dat, Subject==2, Product)
Product
4 ProdB
5 ProdC
6 ProdD
> unlist( subset(dat, Subject==2, Product) )
Product1 Product2 Product3
ProdB ProdC ProdD
Levels: ProdA ProdB ProdC ProdD
> as.character( unlist( subset(dat, Subject==2, Product) ) )
[1] "ProdB" "ProdC" "ProdD"
If you want all of the columns you can drop the third argument (the select= argument):
subset(dat, Subject==2 )
Subject Product
4 2 ProdB
5 2 ProdC
6 2 ProdD
Assuming that dat
is the data frame in question, col
is the name of the column and "value"
is the value that you want, you can do
dat[dat$col=="value",]
That fetches all of the rows of dat
for which dat$col=="value"
, and all of the columns.
First, note that a matrix
and a data.frame
are different things in R. I imagine you have a data.frame
(as that is what is returned by read.csv()
). data.frame
's have named columns (if you don't give them ones, generic ones are created for you).
You can subset a data.frame
by indicating both what rows you want and/or what columns you want. The easiest way to specify which rows is with a logical vector, often built out of comparisons using specific columns of the data.frame
. For example data[["column values"]] == "15"
would make a logical vector which is TRUE
if the corresponding entry in the column column values
is the string "15" (since it is in quotes, it is a string, not a number). You can make as complicated a selection criteria as you like (combining logical vectors with &
and |
) to specify the rows you want. This vector becomes the first argument in the indexing.
A list of column names or numbers can be the second argument. If either argument is missing, all rows (or columns) are assumed.
Putting this all together, you get examples like
data[data[["column values"]] == "15", ]
or using an actual data set (mtcars
)
mtcars[mtcars$am == 1, ]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg"]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg", drop=FALSE]
mtcars[mtcars$hp > 100, c("mpg", "carb")]
Take a look at what each of the conditionals (first arguments, e.g. mtcars$am == 1 & mtcars$hp > 100
) return to get a better sense of how indexing works.
精彩评论