Find the gender from a name [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this questionI recently confronted with a weird yet interesting question. The开发者_C百科 questions is as follows: Need to write a program which can give the gender as output based on the name. Example: INPUT --> John Michael Britney OUTPUT--> male male female
So this is the output I expect. I tried a lot to solve, but I really was not able to crack it. I will be really thankful to this site for giving me an opportunity to share this question.
Actually this is asked in a programming contest as a flyer problem, so I thought this can be programmed.
You can't do it algorithmically: you need a database to do it statistically. This SO question points to many such available resources. Do realize you'll have many, MANY misguesses -- either the Korean Kim's (males) or the Northern European ones (females) may get pretty peeved at that kind of thing, for example;-).
I have been using time solving this as well. My first approach was to use lists of approved names, we have those in Denmark where i'm from, but i quickly realized that only a few countries have. Besides that, i was getting feedback that a probabilistic guess would be much more functional and also that one should be able to filter for a country or language id. I then rebuilded using datasets of users from social networks instead which actually works quite well.
You can check it out at http://genderize.io
Simple example:
http://api.genderize.io?name=kim
{"name":"kim","gender":"female","probability":"0.91","count":687}
http://api.genderize.io?name=kim&country_id=dk
{"name":"kim","gender":"male","probability":"1.00","count":17,"country_id":"dk"}
Don't give up.
I would take a statistical approach... you need to get your hands on a massive names database that actually has gender info... then teach your program to learn from that dataset.
The thing is you need a third variable for correlation. Something like country of origin, ethnicity, etc will narrow your odds even further. You really need that 3rd "clue"...
What about Human Computer Interaction as the 3rd clue.
You could have a click map such as http://css-tricks.com/tracking-clicks-building-a-clickmap-with-php-and-jquery/
Based on where the user clicks you could determine a reasonable statistic of male vs. female. This would be used when unknown is in the database
Heres a Wikipedia on "Gender_HCI":
"Larger displays helped reduce the gender gap in navigating virtual environments. With smaller displays, males’ performance was better than females’. With larger displays, females’ performance improved and males’ performance was not negatively affected."
So have a small box and time the amount of time required to click it. ...?
Statistical approach works really well, depending on countries the precision is 95% or 99%+ with few exceptions (Chinese names, Korean names).
Check out the GendRE API http://namsor.com/api
It recognizes automatically the culture behind a name, to apply the appropriate dictionary (ex. Andrea Rossini is male, Andrea Parker is female, etc.)
I have done this before - it is easy and works well 90% of the time when applied to the correct scenario.
You need to obtain a database of names and the usual gender from somewhere. It is then trivial to search the database.
Some names (for example Andy) are commonly associated with either gender. So you will need at least three gender values - male/female/unknown.
Usually names ending in a,e,i,o,u are feminine names. They may not be accurate compared to API's using statistics but are easy to implement.
精彩评论