Calculate efficiently the minimum over each group and sub-group

2023-02-25 02:53 问答作者：

Imagine that we have drawn a random sample y1, y2, ...,yn from some population, so double y[] and int n are known. And there are groups in our population but we do not know exactly which observation is allocated on a particular group. So to each yi we introduce an allocation variable zi that tells us from which group yi has been drawn. Now we assume that there are int k groups, so zi e {0, .., k-1} for all i. Now to make inferences for the groups I need to iterate my algorithm several number of times say 50,000 or 100,000. And at each iteration we will allocate probabilistically each observation to some group so my array of allocations int z[] will be changing. In this case to count the number of observations in each group and minimum is very easy;

int nj[k], yj_min[k];

/* initializing the variables at each iteration */
for(j=0; j<k; j++){
    nj[j]=0;
    yj_min[j]=y[n]; /* y[] are ordered so y[n] is the maximum*/
} 

for(i=0; i<n; i++){
    nj[z[i]] = nj[z[i]] + 1;
    if(yj_min[z[i]]) < y[z[i]]){
        yj_min[z[i]] = y[z[i]];  
    }
}

but if we introduce a further allocation variable di for each observation yi that will indicate the sub-group from which yi has been sampled (as well sampled probabilistically). There are int m sub-groups, so di e {0, .., m-1}. Then (zi=j, di=s) indicates that the observation yi h开发者_开发技巧as been drawn from the group j and sub-group s.

How could I calculate EFFICIENTLY, as I have to do this at each iteration, the minimum yjs_min over {i:zi=j, di=s}? i.e. the minimum over yi such that zi=j and di=s with j=0, ..k-1 and s=0,..,m-1

It would be great to do something like

for(i=0; i<n; i++){
    njs[z[i]][d[i]] = njs[z[i]][d[i]] + 1;
    if(yjs_min[z[i]][d[i]]) < y[z[i]][d[i]]){
        yjs_min[z[i]][d[i]] = y[z[i]][d[i]];  
    }
}

but obviously this is impossible!!! So please any ideas?

Cheers, Carlos

It looks like you're trying to do something like a Fisher exact test or a permutation test. If so, you might try using a statistics package like R, which is designed to do this kind of stuff, and is likely to have the most efficient algorithms built in already.

That aside, as I understand it, you are stratifying the sample into n subgroups (y), and then each of those subgroups into k sub-subgroups. You want to find the minimum element of each sub-subgroup.

One reasonably efficient solution is: create n*k unique identifiers, and a map that indicates which sub-subgroup each of them corresponds to. Then, randomly allocate these numbers, (using the same distribution) to your sample observations (like you were before). Use an efficient in-place sort (like quicksort with a properly selected pivot) to sort the sample by identifier, so that all elements with the same identifier are stored in a contiguous block of memory. This takes log-linear time, so it should be very quick.

Then you just need to walk through the array in order, and find the minimum element for each unique identifier. This should take linear time and n*k extra space.

Hope that helps.

继续阅读：c

Calculate efficiently the minimum over each group and sub-group

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？