Bell Curve Gaussian Algorithm (Python and/or C#)

2023-02-01 13:23 问答作者：

Here's a somewhat simplified example of what I am trying to do. Suppose I have a formula that computes credit points, but the formula has no constraints (for example, the score might be 1 to 5000). And a score is assigned to 100 people.

Now, I wan开发者_如何学运维t to assign a "normalized" score between 200 and 800 to each person, based on a bell curve. So for example, if one guy has 5000 points, he might get an 800 on the new scale. The people with the middle of my point range will get a score near 500. In other words, 500 is the median?

A similar example might be the old scenario of "grading on the curve", where a the bulk of the students perhaps get a C or C+.

I'm not asking for the code, either a library, an algorithm book or a website to refer to.... I'll probably be writing this in Python (but C# is of some interest as well). There is NO need to graph the bell curve. My data will probably be in a database and I may have even a million people to which to assign this score, so scalability is an issue.

Thanks.

The important property of the bell curve is that it describes normal distribution, which is a simple model for many natural phenomena. I am not sure what kind of "normalization" you intend to do, but it seems to me that current score already complies with normal distribution, you just need to determine its properties (mean and variance) and scale each result accordingly.

References: https://en.wikipedia.org/wiki/Grading_on_a_curve https://en.wikipedia.org/wiki/Percentile (see also: gaussian function)

I think the approach that I would try would be to compute the mean (average) and standard deviation (average distance from the average). I would then choose parameters to fit to my target range. Specifically, I would choose that the mean of the input values map to the value 500, and I would choose that 6 standard deviations consume 99.7% of my target range. Or, a single standard deviation will occupy about 16.6% of my target range.

Since your target range is 600 (from 200 to 800), a single standard deviation would cover 99.7 units. So a person who obtains an input credit score that is one standard deviation above the input mean would get a normalized credit score of 599.7.

So now:

# mean and standard deviation of the input values has been computed.
for score in input_scores:
  distance_from_mean = score - mean
  distance_from_mean_in_standard_deviations = distance_from_mean / stddev
  target = 500 + distance_from_mean_in_standard_deviations * 99.7
  if target < 200:
    target = 200
  if target > 800:
    target = 800

This won't necessarily map the median of your input scores to 500. This approach assumes that your input is more-or-less normally distributed and simply translates the mean and stretches the input bell curve to fit in your range. For inputs that are significantly not bell curve shaped, this may distort the input curve rather badly.

A second approach is to simply map your input range to our output range:

for score in input_scores:
  value = (score - 1.0) / (5000 - 1)
  target = value * (800 - 200) + 200

This will preserve the shape of your input, but in your new range.

A third approach is to have your target range represent percentiles instead of trying to represent a normal distribution. 1% of people would score between 200 and 205; 1% would score between 794 and 800. Here you would rank your input scores and convert the ranks into a value in the range 200..600. This makes full use of your target range and gives it an easy to understand interpretation.

继续阅读：algorithm python

Bell Curve Gaussian Algorithm (Python and/or C#)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？