Any good surname databases?
I'm looking to generate some database test data, specifically table columns containing people's names. In order to get a good indication of how well indexing works with regard to name based searches I want to get as close as possible to real world names and their true frequency distribution, e.g. lots of different names with frequencies distributed over some power law distribution.
I开发者_如何学JAVAdeally I'm looking for a freely available data file with names followed by a single frequency value (or equivalently a probability) per name.
Anglo-saxon based names would be fine, although names from other cultures would be useful also.
I found some US census data which fits the requirement. The only caveat is that it lists only names that occur at least 100 times...
- Genealogy Data: Frequently Occurring Surnames from Census 2000
- names.zip
Found via this blog entry that also shows the power law distribution curve
- Power law curve in surnames(blog entry)
Further to this you can sample from the list using Roulette Wheel Selection, e.g. (not tested)
struct NameEntry
{
public string _name;
public int _frequency;
}
int _frequencyTotal; // Precalculate this.
public string SampleName(NameEntry[] nameEntryArr, Random rng)
{
// Throw the roulette ball.
int throwValue = rng.NextDouble() * frequencyTotal;
int accumulator = 0.0;
for(int i=0; i<nameEntryArr.Length; i++)
{
accumulator += nameEntryArr[i]._frequency;
if(throwValue <= accumulator) {
return nameEntryArr[i]._name;
}
}
// If we get here then we have an array of zero fequencies.
throw new ApplicationException("Invalid operation. No non-zero frequencies to select.");
}
Oxford University provides word lists on their public FTP site as compressed .gz files at ftp://ftp.ox.ac.uk/pub/wordlists/names/.
You can also check out jFairy project. It's written in Java and produces fake data (like for example names). http://codearte.github.io/jfairy/
Fairy fairy = Fairy.create();
Person person = fairy.person();
System.out.println(person.firstName()); // Chloe
System.out.println(person.lastName()); // Barker
System.out.println(person.fullName()); // Chloe Barker
精彩评论