开发者

How do I parse the people's first and last name in Python?

So basically I need to parse a name and find the following info:

  • First Name

  • First Initial (if employee has initials for a first name like D.J., use both initials)

  • Last Name (include if employee has a suffix such as Jr. or III.)


So here's the interface I'm working with:

Input:

names = ["D.J. Richies III", "John Doe", "A.J. Hardie Jr."]
for name in names:
   print parse_name(name)

Expected Output:

{'FirstName': 'D.J.', 'FirstInitial': 'D.J.', 'LastName': 'Richies III' }
{'FirstName': 'John', 'FirstInitial': 'J.', 'LastName': 'Doe' }
{'FirstName': 'A.J.', 'FirstInitial': 'A.J.', 'LastName': 'Hardie Jr.' }

Not really good at Regex, and actually that's probably overkill for this. I'm just guessing:

if name[1] == ".":  # we have a name like开发者_JS百科 D.J.?


I found this library quite useful for parsing names. https://code.google.com/p/python-nameparser/

It can also deal with names that are formatted Lastname, Firstname.


There is no general solution and solution will depend on the constraints you put. For the specs you have given here is a simple solution which gives exactly what you want

def parse_name(name):
   fl = name.split()
   first_name = fl[0]
   last_name = ' '.join(fl[1:])
   if "." in first_name:
      first_initial = first_name
   else:
      first_initial = first_name[0]+"."

   return {'FirstName':first_name, 'FirstInitial':first_initial, 'LastName':last_name}

names = ["D.J. Richies III", "John Doe", "A.J. Hardie Jr."]
for name in names:
   print parse_name(name)

output:

{'LastName': 'Richies III', 'FirstInitial': 'D.J.', 'FirstName': 'D.J.'}
{'LastName': 'Doe', 'FirstInitial': 'J.', 'FirstName': 'John'}
{'LastName': 'Hardie Jr.', 'FirstInitial': 'A.J.', 'FirstName': 'A.J.'}


Well, for your simple example names, you can do something like this.

# This separates the first and last names
name = name.partition(" ")
firstName = name[0]
# now figure out the first initial
# we're assuming that if it has a dot it's an initialized name,
# but this may not hold in general
if "." in firstName:
    firstInitial = firstName
else:
    firstInitial = firstName[0] + "."
lastName = name[2]
return {"FirstName":firstName, "FirstInitial":firstInitial, "LastName": lastName}

I haven't tested it, but a function like that should do the job on the input example you provided.


This is basically the same solution as the one Anurag Uniyal provided, only a little more compact:

import re

def parse_name(name):
    first_name, last_name = name.split(' ', 1)
    first_initial = re.search("^[A-Z.]+", first_name).group()
    if not first_initial.endswith("."):
        first_initial += "."
    return {"FirstName": first_name,
            "FirstInitial": first_initial,
            "LastName": last_name}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜