Filtering a data frame

2023-04-08 14:43 问答作者：

I have read in a csv file in matrix form (having m rows and n columns). I want to filter the matrix by conducting a filter in verbal form:

Select all values from column x where the values of an another column in this row is equal to "blabla".

It is like a select statement in database where I say I am interested in a subset of the matrix where these constraints need to be satisfied.

How can I do it in r? I have the data as dataframe and can access it by the headers. data["column_values" = "15"] does not give me back the rows where the column n开发者_StackOverflow社区amed column_values have values 15 only.

Thanks

You said you just wanted the column x values where column_values was 15, right?

subset(dat, column_values==15, select=x)

I think this may come as a dataframe so it's possble you may need to unlist() it and maybe even "unfactor" it.

> dat
  Subject Product
1       1   ProdA
2       1   ProdB
3       1   ProdC
4       2   ProdB
5       2   ProdC
6       2   ProdD
7       3   ProdA
8       3   ProdB
> subset(dat, Subject==2, Product)
  Product
4   ProdB
5   ProdC
6   ProdD
> unlist( subset(dat, Subject==2, Product) )
Product1 Product2 Product3 
   ProdB    ProdC    ProdD 
Levels: ProdA ProdB ProdC ProdD
> as.character( unlist( subset(dat, Subject==2, Product) ) )
[1] "ProdB" "ProdC" "ProdD"

If you want all of the columns you can drop the third argument (the select= argument):

subset(dat, Subject==2 )

  Subject Product
4       2   ProdB
5       2   ProdC
6       2   ProdD

Assuming that dat is the data frame in question, col is the name of the column and "value" is the value that you want, you can do

dat[dat$col=="value",]

That fetches all of the rows of dat for which dat$col=="value", and all of the columns.

First, note that a matrix and a data.frame are different things in R. I imagine you have a data.frame (as that is what is returned by read.csv()). data.frame's have named columns (if you don't give them ones, generic ones are created for you).

You can subset a data.frame by indicating both what rows you want and/or what columns you want. The easiest way to specify which rows is with a logical vector, often built out of comparisons using specific columns of the data.frame. For example data[["column values"]] == "15" would make a logical vector which is TRUE if the corresponding entry in the column column values is the string "15" (since it is in quotes, it is a string, not a number). You can make as complicated a selection criteria as you like (combining logical vectors with & and |) to specify the rows you want. This vector becomes the first argument in the indexing.

A list of column names or numbers can be the second argument. If either argument is missing, all rows (or columns) are assumed.

Putting this all together, you get examples like

data[data[["column values"]] == "15", ]

or using an actual data set (mtcars)

mtcars[mtcars$am == 1, ]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg"]
mtcars[mtcars$am == 1 & mtcars$hp > 100, "mpg", drop=FALSE]
mtcars[mtcars$hp > 100, c("mpg", "carb")]

Take a look at what each of the conditionals (first arguments, e.g. mtcars$am == 1 & mtcars$hp > 100) return to get a better sense of how indexing works.

继续阅读：filter select

Filtering a data frame

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？