How can I extract email addresses from between '<' and '>'?
I've got a list of emails and names from Outlook, semicolon delimited, like this:
fname lname <email>; fname2 lname2 <email2>; ... ; fnameN lnameN <emailN>
And I'd like to extract the emails and semicolon delimit them like this:
email1; email2;开发者_运维问答 ... ; emailN
How can I do this in Python?
Using regex:
import re
# matches everything which is between < and > (excluding them)
ptrn = re.compile("<([^>]+)>")
# findall returns ['email','email2']. Join concats them.
print '; '.join(ptrn.findall("fname lname <email>; fname2 lname2 <email2>;"))
# email; email2
Using list comprehension:
em = "fname lname <email>; fname2 lname2 <email2>; fnameN lnameN <emailN>"
email_list = [entry.split()[-1][1:-1] for entry in em.split(';')]
# email_list:
# ['email', 'email2', 'emailN']
Breakdown:
for entry in em.split(';')
First it splits up the original string by the semi-colon.
entry.split()
Next it takes each entry, splits it again, this time by space.
entry.split()[-1]
Next it selects the last entry from the split, which is your email.
entry.split()[-1][1:-1]
This takes your email, which is in the form of "<email@addr.com>
" and selects the string contained within the angle brackets ([1:-1] corresponds to select from the second character to second-to-last).
variations on a given theme:
s = 'fname lname <email>; fname2 lname2 <email2>; ... ; fnameN lnameN <emailN>'
print [ s[i+1 : i+s[i:].find('>')] for i,c in enumerate(s) if c == '<' ]
# OR
gen = ( i for i,c in enumerate(s) if c in '<>' )
print [ s[a+1:gen.next()] for a in gen]
精彩评论