开发者

Improving R coding with sapply help

I'm struggling with a bit of code. I can get it to work very inefficiently, but thought that there must be a better way to fix it. I am trying to compile a variable from several different variables. In the coded variables, a "skip" is coded as a specific number (for the example below "99"). I am trying to create a total cost variable based on 10 of these variables.

In this way, I made inefficient code that works by:

var1 <- ifelse(data$v1<99, data$v1, 0)  
var2 <- ifelse(data$v2<99, data$v2, 0) 
... 
var10 <- ifelse(data$v1<99, data$v10, 0) 
sumvar <- var1 + var2 + var3 + var4 + var5 + var6 + var7 + var8 + var9 + var10

I have tried to use the sapply command to make this a bit more elegant and it hasn't worked. I was just trying to see if someone could give me some hints or help on why my code is failing. I put it into a list environment (which I think is correct after trying others like cbind) and try to do a specific call, but get an error. As sample code, I set up the following:

set开发者_运维技巧.seed(1234)
data <- data.frame(x=rnorm(30), y=rnorm(30), z=rnorm(30))
data$x <- ifelse(data$x > 1, 99, data$x)
data$y <- ifelse(data$y > 1, 99, data$y)
data$z <- ifelse(data$z > 1, 99, data$z)

t.list <- list(data$x, data$y, data$z)

sumvar1 <- sapply(1:length(t.list), function(i){
    tempvar <- ifelse(t.list[i] !=99, t.list[i], 0)
    sumvar1 <- sumvar1 + tempvar
})

The problem is that when I try my actual code (or this code), I get:

Error in storage.mode(test) <- "logical" : 
  (list) object cannot be coerced to type 'double'
Calls: sapply -> lapply -> FUN -> ifelse

Obviously I am doing something wrong, but I am not sure what it is. I've looked at the help file for ifelse, but I don't understand the error message that is output. I've gotten the code to run in the inefficient way, but I'd really like to get some feedback and knowledge on how to improve my future coding in R.

Thanks!


If I understand your problem correctly, I think all you need to do is:

## Set any skip values to be equal to zero
data[data == 99] = 0
## Work out the row means
apply(data, 1, sum)

One comment. You think about using R's missing value object NA instead of setting 99 to 0.


If it's the same value (99) for all the variables in your data.frame, just operate on the entire data.frame at once.

> sum(data*(data < 99))
[1] -39.68282

If you want row sums

rowSums(data*(data < 99))  # faster than apply(data*(data < 99), 1, sum)

if you want column sums

colSums(data*(data < 99))  # faster than apply(data*(data < 99), 2, sum)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜