开发者

How to paste text and variables into a logical expression in R?

I want to paste variables in the logical expression that I am using to subset data, but the subset function does not see them as column names when pasted (either with ot without quotes).

I have a dataframe with columns named col1, col2 etc. I want to subset for the rows in which colx < 0.05

This DOES work:

subsetdata<-subset(dat开发者_StackOverflow社区aframe, col1<0.05)

subsetdata<-subset(dataframe, col2<0.05)

This does NOT work:

for (k in 1:2){
subsetdata<-subset(dataframe, paste("col",k,sep="")<0.05)
}

for (k in 1:2){
subsetdata<-subset(dataframe, noquote(paste("col",k,sep=""))<0.05)
}

I can't find the answer; any suggestions?


You're making this a lot harder than it needs to be by trying to use subset. Note that ?subset says the second argument (also named subset) must be an expression and you're not giving it an expression:

> is.expression(paste("col",1:2,sep="")<0.05)
[1] FALSE

You could construct an unevaluated expression then evaluate it as you pass it to subset, but there are much easier ways. For example: just take advantage of the vectorized nature of the < operator.

# sample data
set.seed(21)
dataframe <- data.frame(col1=rnorm(10),col2=rnorm(10),col3=1)

logicalCols <- dataframe[,paste("col",1:2,sep="")] < 0.05
#        col1  col2
#  [1,] FALSE  TRUE
#  [2,] FALSE FALSE
#  [3,] FALSE  TRUE
#  [4,]  TRUE FALSE
#  [5,] FALSE FALSE
#  [6,] FALSE FALSE
#  [7,]  TRUE FALSE
#  [8,]  TRUE FALSE
#  [9,] FALSE  TRUE
# [10,]  TRUE  TRUE
ANY <- apply(logicalCols, 1, any)  # any colx < 0.05
ALL <- apply(logicalCols, 1, all)  # all colx < 0.05
dataframe[ANY,]
dataframe[ALL,]


Here are a couple of options that are closer to the Jasper's approach. First, you could define the column name as a separate variable and then use it to select the variable from the data.frame as if it were a list (since a data.frame is basically a list):

for(k in 1:2){
  colname <- paste("col",k,sep="")
  subsetdata <- dataframe[dataframe[[colname]] < 0.05, ]
}

Or you could refer to the column name as such:

  subsetdata <- dataframe[dataframe[colname,] < 0.05, ]

Finally, you could use subset, although you need to provide a logical expression (as pointed out by Joshua Ulrich):

  subsetdata <- subset(dataframe, eval(substitute(x < 0.05, list(x = as.name(colname)))))


It's not quite clear to me what you're trying to do but perhaps seeing & and | used in a subset operation would be helpful.

Both col1 and col2 less than 0.05:

subsetdata<-subset(dataframe, col1 < 0.05 & col2 < 0.05)

Either col1 or col2 less than 0.05:

subsetdata<-subset(dataframe, col1 < 0.05 | col2 < 0.05)

Joshua's answer is a great way of doing this more easily over many columns.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜