开发者

Any good surname databases?

I'm looking to generate some database test data, specifically table columns containing people's names. In order to get a good indication of how well indexing works with regard to name based searches I want to get as close as possible to real world names and their true frequency distribution, e.g. lots of different names with frequencies distributed over some power law distribution.

I开发者_如何学JAVAdeally I'm looking for a freely available data file with names followed by a single frequency value (or equivalently a probability) per name.

Anglo-saxon based names would be fine, although names from other cultures would be useful also.


I found some US census data which fits the requirement. The only caveat is that it lists only names that occur at least 100 times...

  • Genealogy Data: Frequently Occurring Surnames from Census 2000
  • names.zip

Found via this blog entry that also shows the power law distribution curve

  • Power law curve in surnames(blog entry)

Further to this you can sample from the list using Roulette Wheel Selection, e.g. (not tested)

struct NameEntry
{
    public string _name;
    public int _frequency;
}

int _frequencyTotal; // Precalculate this.


public string SampleName(NameEntry[] nameEntryArr, Random rng)
{
    // Throw the roulette ball.
    int throwValue = rng.NextDouble() * frequencyTotal;
    int accumulator = 0.0;

    for(int i=0; i<nameEntryArr.Length; i++)
    {
        accumulator += nameEntryArr[i]._frequency;
        if(throwValue <= accumulator) {
            return nameEntryArr[i]._name;
        }
    }

    // If we get here then we have an array of zero fequencies.
    throw new ApplicationException("Invalid operation. No non-zero frequencies to select.");
}


Oxford University provides word lists on their public FTP site as compressed .gz files at ftp://ftp.ox.ac.uk/pub/wordlists/names/.


You can also check out jFairy project. It's written in Java and produces fake data (like for example names). http://codearte.github.io/jfairy/

Fairy fairy = Fairy.create(); 
Person person = fairy.person();
System.out.println(person.firstName());           // Chloe
System.out.println(person.lastName());            // Barker
System.out.println(person.fullName());            // Chloe Barker
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜