Python Email Header parsing get_all()
I'm parsing mailbox files with Python and stumbled upon a strange behvior when trying to get all "To:" headers with get_all()
:
tos = message.get_all('to', [])
if开发者_如何学C tos:
tos = getaddresses(tos)
for to in tos:
receiver = EmailInformant()
receiver_email = to[1]
get_all()
gets all "to:" values, which are separated by commas, afaik. getaddresses
then splits the single receivers in a name and an email value.
For the following "To:" header, it does not work as I would expect:
To: example@test.de <example@test.de>
Here, the email address is provided as name and email value, but the parser treats this as two separate "To:" entries, running the for-loop twice. Is this a bug?
Parsing emails is very hard, as there are several different specifications, many behaviors that are or were poorly defined, and implementations that don't follow the specifications. Many of them conflict in some ways.
I know the email module in the standard library is currently being rewritten for Python 3.3, see http://www.bitdance.com/blog/. The rewrite should solve problems like this; it is currently available on pypi for Python 3.2 if you have that option: http://pypi.python.org/pypi/email.
Meanwhile, try tos = set(getaddresses(tos))
to eliminate duplicates.
精彩评论