How to extract an email address from multiple text files
I have approximately 96K text emails that I want to extract the sender's address for. I believe that I can use domdoc for this but need someone to start me off. Can someone please advise whether there is a better way of do开发者_Python百科ing this?
Thanks, Jim
See no reason to do this in PHP... Provided the files are in some form of flat text, copy the file(s) to (for example) the emails/ directory, then
cat * | grep "From: " | egrep -oi ‘\b[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}’ | sort | uniq > mail.list
Of course if you have to do this in PHP then
- Copy the files/mails to a directory
- Get a list of the files with readdir()
- Read the file(s)
- Split the header from to a separate string
- Do a preg_match() on this string to find an email address and put it to $email_arr
- When finished, do array_unique() on the $email_arr.
Using a regular expression in some form would be the best way to do it. If you can save your text emails to files, you can use something like Textpad to search for email addresses based on the regular expression.
You should be able to find regular expressions for email addresses online.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论