How to eliminate email formatting in received email?
I am practicing sending emails with Google App Engine with Python. This code checks to see if message.sender
is in the database:
class ReceiveEmail(InboundMailHandler):
def receive(self, message):
querySender = User.all()
querySender.filter("userEmail =", message.sender)
senderInDatabase = None
for match in querySender:
senderInDatabase = match.userEmail
This works in the development server because I send the email as "az@example.com"
and message.sender="az@example.com"
But I realized that in the production server emails come formatted as "az <az@example.com>
and my code fails because now message.sender="az <az@example.com>"
but the email in the database is simple "az@example.com".
I searched for how to do this with regex
and it is possible but I was wondering if I can do this with Python lists? Or, what do you think is the best way to achieve this result? I need to take just the email address from the message.sender
.
App Engine documentation acknowledges the formatting but I could not find a specific way to select the email address only.
Thanks!
EDIT2 (re: Forest answer)
@Forest:
parseaddr()
appears to be simple enough:
>>> e = "az <az@example.com>"
>>> parsed = parseaddr(e)
>>> parsed
('az', 'az@example.com')
>>> parsed[1]
'az@example.com'
>>>
But this still does not cover the other type of formatting that you mention: user@example.com (Full Name)
>>> e2 = "<az@example.com> az"
>>> parsed2 = parsead开发者_运维百科dr(e2)
>>> parsed2
('', 'az@example.com')
>>>
Is there really a formatting where full name comes after the email?
EDIT (re: Adam Bernier answer)
My try about how the regex works (probably not correct):
r # raw string
< # first limit character
( # what is inside () is matched
[ # indicates a set of characters
^ # start of string
> # start with this and go backward?
] # end set of characters
+ # repeat the match
) # end group
> # end limit character
Rather than storing the entire contents of a To: or From: header field as an opaque string, why don't you parse incoming email and store email address separately from full name? See email.utils.parseaddr()
. This way you don't have to use complicated, slow pattern matching when you want to look up an address. You can always reassemble the fields using formataddr()
.
If you want to use regex try something like this:
>>> import re
>>> email_string = "az <az@example.com>"
>>> re.findall(r'<([^>]+)>', email_string)
['az@example.com']
Note that the above regex handles multiple addresses...
>>> email_string2 = "az <az@example.com>, bz <bz@example.com>"
>>> re.findall(r'<([^>]+)>', email_string2)
['az@example.com', 'bz@example.com']
but this simpler regex doesn't:
>>> re.findall(r'<(.*)>', email_string2)
['az@example.com>, bz <bz@example.com'] # matches too much
Using slices—which I think you meant to say instead of "lists"—seems more convoluted, e.g.:
>>> email_string[email_string.find('<')+1:-1]
'az@example.com'
and if multiple:
>>> email_strings = email_string2.split(',')
>>> for s in email_strings:
... s[s.find('<')+1:-1]
...
'az@example.com'
'bz@example.com'
精彩评论