开发者

How to eliminate email formatting in received email?

I am practicing sending emails with Google App Engine with Python. This code checks to see if message.sender is in the database:

class ReceiveEmail(InboundMailHandler):
    def receive(self, message):
        querySender = User.all()
        querySender.filter("userEmail =", message.sender)
        senderInDatabase = None
        for match in querySender:
            senderInDatabase = match.userEmail

This works in the development server because I send the email as "az@example.com" and message.sender="az@example.com"

But I realized that in the production server emails come formatted as "az <az@example.com> and my code fails because now message.sender="az <az@example.com>" but the email in the database is simple "az@example.com".

I searched for how to do this with regex and it is possible but I was wondering if I can do this with Python lists? Or, what do you think is the best way to achieve this result? I need to take just the email address from the message.sender.

App Engine documentation acknowledges the formatting but I could not find a specific way to select the email address only.

Thanks!

EDIT2 (re: Forest answer)

@Forest: parseaddr() appears to be simple enough:

>>> e = "az <az@example.com>"
>>> parsed = parseaddr(e)
>>> parsed
('az', 'az@example.com')
>>> parsed[1]
'az@example.com'
>>>

But this still does not cover the other type of formatting that you mention: user@example.com (Full Name)

>>> e2 = "<az@example.com> az"
>>> parsed2 = parsead开发者_运维百科dr(e2)
>>> parsed2
('', 'az@example.com')
>>>

Is there really a formatting where full name comes after the email?

EDIT (re: Adam Bernier answer)

My try about how the regex works (probably not correct):

r    # raw string
<     # first limit character
(     # what is inside () is matched     
[       # indicates a set of characters
^         # start of string
>         # start with this and go backward?
]       # end set of characters
+       # repeat the match
)     # end group
>    # end limit character


Rather than storing the entire contents of a To: or From: header field as an opaque string, why don't you parse incoming email and store email address separately from full name? See email.utils.parseaddr(). This way you don't have to use complicated, slow pattern matching when you want to look up an address. You can always reassemble the fields using formataddr().


If you want to use regex try something like this:

>>> import re
>>> email_string = "az <az@example.com>"
>>> re.findall(r'<([^>]+)>', email_string)
['az@example.com']

Note that the above regex handles multiple addresses...

>>> email_string2 = "az <az@example.com>, bz <bz@example.com>"
>>> re.findall(r'<([^>]+)>', email_string2)
['az@example.com', 'bz@example.com']

but this simpler regex doesn't:

>>> re.findall(r'<(.*)>', email_string2)
['az@example.com>, bz <bz@example.com'] # matches too much

Using slices—which I think you meant to say instead of "lists"—seems more convoluted, e.g.:

>>> email_string[email_string.find('<')+1:-1]
'az@example.com'

and if multiple:

>>> email_strings = email_string2.split(',')
>>> for s in email_strings:
...   s[s.find('<')+1:-1]
...
'az@example.com'
'bz@example.com'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜