How to calculate a probability with given data? [closed]
this is mathematical question for programming needs... whats the way to calculate probability if you have some data like this
40000
32423432
3423423
4543535
354545
the lowest number is lets say 40000 and the biggest is 32423432 those numbers are given in some txt file like a input parameter and i need to generate a xml file that will be in this format
<number="40000" probability="0.0">
<number="32423432" probability="1.0">
<number="354545" probability="0.4532">
i wrot开发者_JAVA百科e the program with the input parametars and i use TinyXML to generate the xml file but im having problems with the formula... so if anyone halp thanks!
The best you can do here is compute the histogram
.
If you want a linear scaling (mapping) then this will work (using doubles):
double newsmallno = (number - smallest)/(largest - smallest)
Note this gives a value of 0.0097 for a number of 354545, so maybe you don't want it to be linear, in which case you need to give more details.
Sounds like you're looking for:
double max = 32423432;
double min = 40000;
double val = 354545;
double prob = (val - min) / (max - min)
This isn't exactly probablility, it's more like you're transforming a number between [0 1] to [min, max]
I think you may be looking for a Cumulative distribution function. Wikipedia says:
The cumulative distribution function (CDF) describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far" function of the probability distribution.
You will have to determine whether your numbers are normally distributed (bell-curve) or uniformly distributed.
If that is what you are looking for, then you might need to pick up a statistics book or cross post on stats.stackexchange.com.
If you have a model of the underlying random distribution you can take advantage of this knowledge to deduce model parameters. For example, you might know the data are supposed to have a normal distribution, but the mean and standard deviation are unknown. The data at hand give an imperfect picture of the parameters for that distribution. (Note well: The example data almost certainly are not normally distributed.)
If you do not have such a model, about the best you can do is to construct an estimate of the cumulative distribution function. A histogram can serve as a good estimator of the CDF. Note that if you do this right, you will not have CDF(40000)=0 and CDF(32423432)=1. Think about it this way: Collect more data and you might well get a sample that is less than 40000 or one that is greater than 32423432.
精彩评论