Assign weights based on frequency of occurrence of values
I would like to ask you for help with my data frame. It is a vector of many phases and for every one we have names of variables. Lets say
vec<-data.frame(phase1= c("var1","var2","var3","var4","var5","var6"), 开发者_开发技巧
phase2= c("var1","var3","var4","var2","var6","var5"),
phase3= c("var4","var3","var2","var1","var6","var5"))
vec
phase1 phase2 phase3
1 var1 var1 var4
2 var2 var3 var3
3 var3 var4 var2
4 var4 var2 var1
5 var5 var6 var6
6 var6 var5 var5
Now, lets say we are interested for the first 3 rows and therefore the weight of variable in one of them is 1/3, zero otherwise. My function would ideally output sth like that:
phase1 phase2 phase3
var1 0.33 0.33 0
var2 0.33 0 0.33
var3 0.33 0.33 0.33
var4 0 0.33 0.33
var5 0 0 0
var6 0 0 0
The function should also be applicable for the first 4, 5 or all 6 rows (ie the weights will change then). Regards, Alex
I believe you are looking for this:
n<-3
l<-dim(vec)[1]
wghts<-c(rep(1/n, n), rep(0, l-n))
result<-do.call(cbind, lapply(vec, function(curcol){
wghts[match(curcol, vec$phase1)]
}))
If need be, you could add:
rownames(result)<-vec$phase1
You can use %in%
to find matches and ifelse
to set weigths:
set_weigth <- function(x, v, w) ifelse(x%in%v,w,0)
as.data.frame(lapply(vec, set_weigth, v=vec$phase1[1:3], w=0.33))
You are essentially setting the weight of var_i
in phase_i
as the fraction of rows var_i
occurs in phase_i
. The simplest way is to use the table()
function: given a vector of discrete values, it produces a frequency-count of the different values. If you want to get your desired weights based on the first 3 rows of the data-frame vec
, you simply do:
> sapply(vec[1:3,],table)/3
phase1 phase2 phase3
var1 0.3333333 0.3333333 0.0000000
var2 0.3333333 0.0000000 0.3333333
var3 0.3333333 0.3333333 0.3333333
var4 0.0000000 0.3333333 0.3333333
var5 0.0000000 0.0000000 0.0000000
var6 0.0000000 0.0000000 0.0000000
Similarly if you want to use the first 4 rows you do:
> sapply(vec[1:4,],table)/4
phase1 phase2 phase3
var1 0.25 0.25 0.25
var2 0.25 0.25 0.25
var3 0.25 0.25 0.25
var4 0.25 0.25 0.25
var5 0.00 0.00 0.00
var6 0.00 0.00 0.00
精彩评论