开发者

setting levels inside lapply loop in r

I´m trying to clean the f开发者_运维百科actor variables in a dataframe from trailing spaces. However the levels assignment doesnt work inside my lapply function.

rm.space<-function(x){
    a<-gsub(" ","",x)
    return(a)}


lapply(names(barn),function(x){
    levels(barn[,x])<-rm.space(levels(barn[,x]))
    })

Any ideas how I can assign levels inside a lapply function?

//M


R is vectorised, you do not need apply():

> f <- as.factor(sample(c("  a", " b", "c", "  d"), 10, replace=TRUE))                                                                                                             
> levels(f)                                                                                                                                                                        
[1] "  a" " b"  "c"   "  d"                                                                                                                                                        
> levels(f) <- gsub(" +", "", levels(f), perl=TRUE)                                                                                                                                
> levels(f)                                                                                                                                                                        
[1] "a" "b" "c" "d"                                                                                                                                                                
> f                                                                                                                                                                                
 [1] d a c b c d d a a a                                                                                                                                                           
Levels: a b c d                                                                                                                                                                    
>


From your code I read that the lapply is used to loop over different variables, not over the levels of the factor. So then you do need some kind of looping structure, but lapply is a bad choice:

  • you loop over a vector -names(barn)- so it's better to use sapply
  • the apply family will return the result from each loop, something you don't want. So you're using memory without purpose.

Anyway, in case you need to assign something to a variable in your global environment within a lapply, you need the <<- operator. Say you need to have a number of variables you selected where the spaces have to be removed:

f <- paste("",letters[1:5])

Df <- data.frame(
    X1 = sample(f,10,r=T),
    X2 = sample(f,10,r=T),
    X3 = sample(f,10,r=T)
    )

# Bad example :   
lapply(c("X1","X3"),function(x){
    levels(Df[,x])<<-gsub(" +","",levels(Df[,x]))
    })

gives

> str(Df)
'data.frame':   10 obs. of  3 variables:
 $ X1: Factor w/ 3 levels "a","b","c": 2 3 1 1 1 2 3 2 2 2
 $ X2: Factor w/ 5 levels " a"," b"," c",..: 4 5 4 2 5 5 1 2 5 3
 $ X3: Factor w/ 5 levels "a","b","c","d",..: 2 3 4 1 4 1 3 3 5 4

Better is to use a for loop :

for( i in c("X1","X3")){
    levels(Df[,i])<-gsub(" +","",levels(Df[,i]))
}

Does what you need without the hassle of the <<- operator and without holding memory unnecessarily.


As Joris states lapply works on local copy of data.frame, so it won't modify your original data. But you could use it to replace your data:

barn[] <- lapply(barn, function(x) {
    levels(x) <- rm.space(levels(x))
    x
    })

It is useful when you have different types in data and want to modify only factor's, e.g.:

factors <- sapply(barn, is.factor)
barn[factors] <- lapply(barn[factors], function(x) {
                    levels(x) <- rm.space(levels(x))
                    x
                 })
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜