开发者

Assign weights based on frequency of occurrence of values

I would like to ask you for help with my data frame. It is a vector of many phases and for every one we have names of variables. Lets say

vec<-data.frame(phase1= c("var1","var2","var3","var4","var5","var6"),     开发者_开发技巧
                 phase2= c("var1","var3","var4","var2","var6","var5"),    
                 phase3= c("var4","var3","var2","var1","var6","var5"))

 vec
  phase1 phase2 phase3
1   var1   var1   var4
2   var2   var3   var3
3   var3   var4   var2
4   var4   var2   var1
5   var5   var6   var6
6   var6   var5   var5

Now, lets say we are interested for the first 3 rows and therefore the weight of variable in one of them is 1/3, zero otherwise. My function would ideally output sth like that:

          phase1 phase2 phase3
   var1   0.33   0.33    0
   var2   0.33   0       0.33
   var3   0.33   0.33    0.33
   var4   0      0.33    0.33
   var5   0      0       0
   var6   0      0       0

The function should also be applicable for the first 4, 5 or all 6 rows (ie the weights will change then). Regards, Alex


I believe you are looking for this:

n<-3
l<-dim(vec)[1]
wghts<-c(rep(1/n, n), rep(0, l-n))
result<-do.call(cbind, lapply(vec, function(curcol){
        wghts[match(curcol, vec$phase1)]
    }))

If need be, you could add:

rownames(result)<-vec$phase1


You can use %in% to find matches and ifelse to set weigths:

set_weigth <- function(x, v, w) ifelse(x%in%v,w,0)
as.data.frame(lapply(vec, set_weigth, v=vec$phase1[1:3], w=0.33))


You are essentially setting the weight of var_i in phase_i as the fraction of rows var_i occurs in phase_i. The simplest way is to use the table() function: given a vector of discrete values, it produces a frequency-count of the different values. If you want to get your desired weights based on the first 3 rows of the data-frame vec, you simply do:

> sapply(vec[1:3,],table)/3

        phase1    phase2    phase3
var1 0.3333333 0.3333333 0.0000000
var2 0.3333333 0.0000000 0.3333333
var3 0.3333333 0.3333333 0.3333333
var4 0.0000000 0.3333333 0.3333333
var5 0.0000000 0.0000000 0.0000000
var6 0.0000000 0.0000000 0.0000000

Similarly if you want to use the first 4 rows you do:

> sapply(vec[1:4,],table)/4
     phase1 phase2 phase3
var1   0.25   0.25   0.25
var2   0.25   0.25   0.25
var3   0.25   0.25   0.25
var4   0.25   0.25   0.25
var5   0.00   0.00   0.00
var6   0.00   0.00   0.00
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜