python imap: how to parse multipart mail content
A mail can contain different blocks like:
--0016e68deb06b58acf04897c624e
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
content_1
...
--0016e68deb06b58acf04897c624e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
con开发者_运维技巧tent_2
... and so on
How can I get content of each block with python?
And also how to get properties of each block? (content-type, etc..)For parsing emails I have used Message.walk()
method like this:
if msg.is_multipart():
for part in msg.walk():
...
For content you can try: part.get_payload()
. For content-type there is: part.get_content_type()
You will find documetation here: http://docs.python.org/library/email.message.html
You can also try email
module with its iterators.
http://docs.python.org/library/email.html
A very simple example (msg_as_str contains the raw bytes you got from the imap server):
import email
msg = email.message_from_string(msg_as_str)
print msg["Subject"]
I have wrote this code. You can use it if you like it for parsing multipart content:
if mime_msg.is_multipart():
for part in mime_msg.walk():
if part.is_multipart():
for subpart in part.get_payload():
if subpart.is_multipart():
for subsubpart in subpart.get_payload():
body = body + str(subsubpart.get_payload(decode=True)) + '\n'
else:
body = body + str(subpart.get_payload(decode=True)) + '\n'
else:
body = body + str(part.get_payload(decode=True)) + '\n'
else:
body = body + str(mime_msg.get_payload(decode=True)) + '\n'
body = bytes(body,'utf-8').decode('unicode-escape')
And if you want to take out in plain text then convert body into html2text.HTML2Text()
精彩评论