开发者

How to extract e-mail data into R?

How could I开发者_StackOverflow社区 export my e-mail database from Gmail (or Thunderbird) into R?

Like there is the rgoogledocs package and twitteR, is there a gmailR package, or a standard format for exporting emails into stat packages ?

Tal


Need to install it library(edeR) first. May need to manually install Java 64 on Windows 8, may need to enable IMAP access in Gmail.

dat3 <-extractKeyword(username="YOURLOGIN@gmail.com",
                  password="YouRPaSS",
                  kw="adsense",
                  nmail=5)

This will download 5 emails with keyword 'adsense'.


Standard email (on a Unix system) is either an mbox file (containing several messages) or a maildir setup where each mail is a file in a directory.

Either way, it's ascii text. That is how a MUA (mail-user agents -- your mail reader) is orthogonal to your MTA (mail-transport agent -- mail server software like exim, qmail, postfix, ...). The MTA may use a network protocol like POP3 or IMAP to serve the mail files to the client in which case the client (which may be Gmail or Thunderbird) no longer sees the underlying files. So you may need to learn how to export your mail from whichever backend you employ and then read it.

This has nothing to do with R or programming so far --- unless you now feel you must extend R with POP3 or IMAP facilities to connect to a (remote) mail server.


Now there is R package to extract email data. This package still in testing phase but anyone can install it from GitHub, the package name is edeR. Right now this can extract email data from IMAP enabled Gmail.


Gmail and Thunderbird are not the same... you can enable Gmail account in Thunderbird, hence export each email in ASCII file, hence write a R batch script that will take each file and import it in R as an object, hence... you get the point. =)

Usually I'm trying to avoid "the pedestrian approach"... but I'm getting an impression that you're prone on using R as a "general purpose" programming language... Python or JAVA, on the other hand can be quite efficient, so you can write (or ask someone to write it for you) a script that will "bring" you data in desirable format, and then crunch it in R. R has matured a lot, and it's not solely a tool for statistical analysis any more, but it's always a good idea to use some widely-known programming language to carry out your data.

So there... Roll up your sleeves, and dive into Python (JAVA, C... whatever you feel like diving in)!

P.S. I reckon that this has something to do with your previous post with word cloud...


Once you have exported your e-mails in mbox format into your PC, you can make use of both tm and tm.plugin.mail packages in R. The latter makes it possible to export your e-mails into R.

require("tm")
require("tm.plugin.mail")

Then, to convert your e-mails from mbox (i.e., several mails in a single box) format to eml (i.e., every mail in a single file) format: convert_mbox_eml(mbox, dir). In the example below, mbox is represented by "yourmails.mbox" and it describes the mbox location. The output directory is given by "your_mails".

convert_mbox_eml("yourmails.mbox", "your_mails")

You can read in an electronic mail document and inspect with the following R commands.

mails <- VCorpus(DirSource("your_mails/"), readerControl = list(reader = 
         readMail))

inspect(mails)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜