Algorithm for how many students in a class did better than a student given this information
I want to make a simple application that will take in:
- number of s开发者_StackOverflow中文版tudents
- class average (score/100)
- median grade (score/100)
- class standard deviation
- the current grade of a student (score/100)
The output would be how many students did better than that student.
I'm interested in the best estimate possible with this information.
I'm just not sure how to go about calculating this.
The grades in my data set have the same average as the median, so please, simply explain how to do it this way.
The commenters above are correct that without more information you can't nail this down precisely. However, as Steve Jobs likes to say, real artists ship so here is what I would do if you need a ball park estimate.
The two most straight forward ways to go about this is to either assume the data is normally distributed or from a beta distribution (because the scores are bounded between 0-100). Because you said the mean and median are close in your data I will give code to calculate the quantity assuming a normal distribution.
A normal distribution has two parameters and a mean and a variance. The best estimate of the mean you are going to get is the sample mean from the data, and best estimate of the variance will be the square of the standard deviation. So you if you want to know how many students did worse than a particular score what you need is the cumulative distribution function.
double mu=sample_mean;
double sigma=sample_std_deviation;
int numStudents=sample_size;
int NumberBetterThan(double score,double mu,double sigma,int numStudents)
{
double temp=(score-mu)/sqrt(2*pow(sigma,2.0));
temp=0.5*(1+erf(temp));
int result=numStudents*(1.0-temp); // truncates to int but you can return a float if you are ok with a fractional number of students
return(result);
}
erf is the error function from statistics. You can find c++ code to implement it many places on the web. One such place is here.
You need to know more than average, median, and standard deviation to have a probability distribution of the scores, and you need that distribution to figure out how many students did better.
If you assume a probability distribution (or know the distribution because the teacher graded on that curve), the number of students that did better would be (cdf(maximum possible score) - cdf(student's score)) * number of students
, where cdf is the cumulative disribution function for that distribution.
精彩评论