Improving R coding with sapply help
I'm struggling with a bit of code. I can get it to work very inefficiently, but thought that there must be a better way to fix it. I am trying to compile a variable from several different variables. In the coded variables, a "skip" is coded as a specific number (for the example below "99"). I am trying to create a total cost variable based on 10 of these variables.
In this way, I made inefficient code that works by:
var1 <- ifelse(data$v1<99, data$v1, 0)
var2 <- ifelse(data$v2<99, data$v2, 0)
...
var10 <- ifelse(data$v1<99, data$v10, 0)
sumvar <- var1 + var2 + var3 + var4 + var5 + var6 + var7 + var8 + var9 + var10
I have tried to use the sapply
command to make this a bit more elegant and it hasn't worked. I was just trying to see if someone could give me some hints or help on why my code is failing. I put it into a list environment (which I think is correct after trying others like cbind) and try to do a specific call, but get an error. As sample code, I set up the following:
set开发者_运维技巧.seed(1234)
data <- data.frame(x=rnorm(30), y=rnorm(30), z=rnorm(30))
data$x <- ifelse(data$x > 1, 99, data$x)
data$y <- ifelse(data$y > 1, 99, data$y)
data$z <- ifelse(data$z > 1, 99, data$z)
t.list <- list(data$x, data$y, data$z)
sumvar1 <- sapply(1:length(t.list), function(i){
tempvar <- ifelse(t.list[i] !=99, t.list[i], 0)
sumvar1 <- sumvar1 + tempvar
})
The problem is that when I try my actual code (or this code), I get:
Error in storage.mode(test) <- "logical" :
(list) object cannot be coerced to type 'double'
Calls: sapply -> lapply -> FUN -> ifelse
Obviously I am doing something wrong, but I am not sure what it is. I've looked at the help file for ifelse, but I don't understand the error message that is output. I've gotten the code to run in the inefficient way, but I'd really like to get some feedback and knowledge on how to improve my future coding in R.
Thanks!
If I understand your problem correctly, I think all you need to do is:
## Set any skip values to be equal to zero
data[data == 99] = 0
## Work out the row means
apply(data, 1, sum)
One comment. You think about using R's missing value object NA
instead of setting 99 to 0.
If it's the same value (99) for all the variables in your data.frame, just operate on the entire data.frame at once.
> sum(data*(data < 99))
[1] -39.68282
If you want row sums
rowSums(data*(data < 99)) # faster than apply(data*(data < 99), 1, sum)
if you want column sums
colSums(data*(data < 99)) # faster than apply(data*(data < 99), 2, sum)
精彩评论