Birthday Paradox: How to programmatically estimate the probability of 3, and N, people sharing a birthday
There are extensive resources on the internet discussing the famous Birthday Paradox. It is clear to me how you calculate the probability of two people sharing a birthday i.e. P(same) = 1 - P(different)
. However if I ask myself something apparently more simple I stall: firstly, let's say I generate two random birthdays. Getting the same birthday is like tossing a coin. Either the two persons share a birthday (Heads) or they don't share a birthday (Tail). Run this 500 times and the end result (#Heads/500) will somehow be close to 0.5
Q1) But how do I think about this if I generate three random birthdays? How can I estimate the probability then? Obviously my coin analogy won't be applicable.
Q2) once I have figured out the above I will need to scale it up and generate 30 or 50 birthdays. Is there a recommended technique or algorithm to isolate identical birthdays from a large set? Should I put them into arrays and loop through them?
Here's what I think I need:
Q1)
r = 25 i.e. each trial run generates 25 birthdays
Trial 1 >
3 duplicates: 0
Trial 2 >
3 duplicates: 0
Trial 3 >
开发者_开发问答3 duplicates: 2
Trial 4 >
3 duplicates: 1
...
T100 >
3 duplicates: 2
estimated probability of 3 persons sharing a birthday in a room of 25 = (0+0+2+1+...+2)/100
Q2)
- Create an array for 2 duplicates, an array for 3 duplicates and one for more than 3 duplicates
- add each generated birthday one by one into the first array. But before doing so, loop through the array to see if it's in there already. If so, add it to the second array, but before doing so repeat the above process and so on
- It doesn't seem to be a very efficient algorithm though :) suggestions to improve the Big O here?
Create an integer array of length 365, initialized to 0. Then generate N (in your case 25) random numbers between 1-365 and increase that number in the array (ie. bdays[random_value]++). Since you are only interested in a collision happening, right after increasing the number in the array check if it is greater than 2 (If it is then there is a second collision, which means there are 3 people with the same birthday). Keep track of collisions and execute this as many times as you wish (1000).
In the end, the ratio of collisions/1000 will be your requested value.
and, no tossing coins analogy is wrong.
Check this similar question and its answers on CrossValidated, but I think it is really worth thinking about the classic Birthday problem again to get the basics.
To the second part of your question: depends on the language you use. I definitely suggest using R to solve a problem like that, as checking identical birthdays in a list/vector/data frame can easily done with a simple unique
call. To run a such simple MC simulation R is again really handy, check the second answer on the link above.
Sounds like your first task will be to create a method that will generate random birthdays. To keep things simple, you can use the numbers 1-365 to denote unique birthdays.
Store however many random birthdays (2 in the first case more later) in an ArrayList as Strings. You will want to use a loop to call the random number function and store the value in your list.
Then make a function to search the ArrayList for duplicates. If there are any duplicates (no matter how many) then that's a Heads result. If there are no matches then it's a Tails.
Your probabilities will be far different from 50/50 until you get to 20 or so.
精彩评论