开发者

R: Select cells in data.frame based on other attribute in the same instance

Ok, the title may not be the most descriptive. It's easier to explain with an example.

I have a data.frame like this:

A B 1 2
L M 3 0
P Q 5 6

I want to output an array of the cell in column 1 if col3 > col4, or the cell in col2 if col3 <= col4. The output vector from this data.frame开发者_开发百科 would be B, L, Q.

I'm aware that I still haven't explained my problem very well, so here is what it would look like in an imperative language:

vector = []
for each rows as row
  if row[3] > row[4]
    vector.add(row[1])
  else
    vector.add(row[2])
return vector

I apologise if this problem has already been answered, but unfortunately Google is not much of help when it comes to R questions.

Thanks, Andreas


Your test case is not sufficiently complex to expose some of the lurking crocodiles in R relating to objects of class == factor, default options for data.frame(), and the use of functions like apply and ifelse. I could apologize for the length of the answer, but it is really just a small subset of what you can read in The R Inferno. Let's say you create a data.frame, dfrm:

dfrm <-data.frame(textConnection("A  B  2  12
L  M  3  0
P  Q  5  6", header=FALSE)

NOTICE: I modified your first case a bit. Now run the first solution offered: you get

 apply(dfrm, 1, function(x){ifelse(x[3] > x[4], x[1], x[2])})
[1] "A" "L" "P"

Clearly 2 is NOT greater than 12, so what happened? The apply function works on matrices and converted the data.frame to a matrix before doing the function and tested "2" > "12" which is TRUE. So crocodile#1 is the default behavior of apply().

Errors or warnings also result from what might seem at first and second glance to be perfectly sensible R code:

vector <- dfrm$V2; 
vector[V3 > V4] <- V1[V3 > V4]

(It wasn't a particularly informative error message, to me anyway, ... something about NA's ... and it was due to the fact that I was trying to assign a value to a factor object for which there was no existing level.) That''s the second crocodile: the default class for character values given to the data.frame function is "factor" rather than "character".

The third crocodile is the behavior of ifelse:

 with(dfrm, ifelse(V3 > V4, V1, V2) )
[1] 1 2 3

WTF? The ifelse function is automatically converting the factors in V1 and V2 into their internal numeric representations and it's doing so because the function coerces the returned values on the basis of the type of the conditional arguments. Not the way I would have designed such a function but these things were worked out decades ago, so changing them is nigh unto impossible. So a couple of "right", or at least safer, ways to do the work you asked for: Method1:

with(dfrm, ifelse(V3 > V4, as.character(V1), as.character(V2) ) )  
[1] "B" "L" "Q"

Method2:

vector <- as.character(dfrm$V2)  
vector[which(dfrm$V3 > dfrm$V4)] <- as.character(dfrm$V1[which(dfrm$V3 > dfrm$V4)])  
vector  
[1] "B" "L" "Q"


This should work (assuming df is your data frame)

apply(df, 1, function(x){ifelse(x[3] > x[4], x[1], x[2])})
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜