开发者

How to extract an email address from multiple text files

I have approximately 96K text emails that I want to extract the sender's address for. I believe that I can use domdoc for this but need someone to start me off. Can someone please advise whether there is a better way of do开发者_Python百科ing this?

Thanks, Jim


See no reason to do this in PHP... Provided the files are in some form of flat text, copy the file(s) to (for example) the emails/ directory, then

cat * | grep "From: " | egrep -oi ‘\b[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}’ | sort | uniq > mail.list

Of course if you have to do this in PHP then

  1. Copy the files/mails to a directory
  2. Get a list of the files with readdir()
  3. Read the file(s)
  4. Split the header from to a separate string
  5. Do a preg_match() on this string to find an email address and put it to $email_arr
  6. When finished, do array_unique() on the $email_arr.


Using a regular expression in some form would be the best way to do it. If you can save your text emails to files, you can use something like Textpad to search for email addresses based on the regular expression.

You should be able to find regular expressions for email addresses online.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜