
subselection dataframe

I have a simple questioon I think. In my dataframe I would like to make subset where column Quality_score is equal to: Perfect, Perfect*, Perfect*, Good, Good** and Good***

This in my solution by now:

>Qu开发者_运维知识库ality_scoreComplete <- subset(completefile,Quality_score == "Perfect" | Quality_score=="Perfect***" | Quality_score=="Perfect****" | Quality_score=="Good" | Quality_score=="Good***" | Quality_score=="Good****") 

Is there a way to simplify this method? Like:

methods<-c('Perfect', 'Perfect***', 'Perfect****', 'Good', 'Good***','Good***')
Quality_scoreComplete <- subset(completefile,Quality_score==methods)

Thank you all,


You do not even need subset, check: ?"["

Quality_scoreComplete <- completefile[completefile$Quality_score %in% methods,]

EDITED: based on kind comment of @Sacha Epskamp: == in the expression gives wrong results, so corrected it above to %in%. Thanks!

Example of the problem:

> x <- c(17, 19)
> cars[cars$speed==x,]
   speed dist
29    17   32
31    17   50
36    19   36
38    19   68
> cars[cars$speed %in% x,]
   speed dist
29    17   32
30    17   40
31    17   50
36    19   36
37    19   46
38    19   68

One thing that works is grepl, this searches for a pattern in strings and returns a logical indicating if it is there. You can use the | operator in a string as well to indicate OR, and ignore.case to ignore case sensitivity:

methods<-c('Perfect', 'Perfect*', 'Perfect*', 'Good', 'Good','Good*')

completefile <- data.frame( Quality_score = c( methods, "bad", "terrible", "abbysmal"), foo = 1)

1       Perfect   1
2      Perfect*   1
3      Perfect*   1
4          Good   1
5          Good   1
6         Good*   1

EDIT: I see now that case sensitivity was not an issue, thanks dyslexia! You could simplify then to:





验证码 换一张
取 消

