开发者

Select item from list according to weighting

If I have a list such as this

  • White British, 85.67
  • White (other), 5.27
  • White Irish, 1.2
  • 开发者_运维技巧Mixed race, 1.2
  • Indian, 1.8
  • Pakistani, 1.3
  • Bangladeshi, 0.5
  • Other Asian (non-Chinese), 0.4
  • Black Caribbean, 1
  • Black African, 0.8
  • Black (others), 0.2
  • Chinese, 0.4
  • Other, 0.4

And I want to select 10,000 values from this list for example but I want to have the selected values match the weighting associated with them. So ~85% of the selected values should be 'White British'.

I've been attempting this with LINQ but have had no luck.

var items = from dataItem in listOfItems
where (dataItem.uses / listOfItems.Count) <= dataItem.weighting
select dataItem;

Where uses is how many times that value has been selected and listOfItems.Count is how many have been selected overall so far.

Thanks


I guess to try to create 10000 values from "White British", "White", ... and the resulting set should have a distribution near (better equal) to the percentages you have given.

Here is my try to the solution:


    struct Info
    {
        public string Name { get; set; }
        public float Percent { get; set; }
    }

    class Statistics
    {
        public IEnumerable<string> CreateSampleSet(int sampleSize, params Info[] infos)
        {
            var rnd = new Random();
            var result = new List<string>();
            infos = infos.OrderByDescending(x => x.Percent).ToArray();
            foreach (var info in infos)
            {
                for(var _ = 0; _ < (int)(info.Percent/100.0*sampleSize); _++)
                result.Add(info.Name);
            }

            if (result.Count < sampleSize)
            {
                while (result.Count < sampleSize)
                {
                    var p = rnd.NextDouble()*100;
                    var value = infos.First(x => x.Percent <= p);
                    result.Add(value.Name);
                }
            }

            return result;
        }
    }

this will simply use the given percentages to add the desiered amount (or better the floor-value of it) to the result and finaly adds random results till the desired samplesize is reached.

Note: the last random results will be added with respect to the given distribution

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜