Best way to store emails for historical/review purposes
I have a service which process emails in a mailbox and once processed, stores some information from the email in the database. At the minute the schema looks something like:
- ID
- Sender
- Sub开发者_JS百科ject
- Body (result of being parsed/stripped to plain text)
- DateReceived
I am building a web front-end for the database and the main purpose of storing the emails is to provide the facility for users to look back and see what they have sent. However, another reason is for auditing purposes on my end.
The emails at the moment are being moved to specific mailbox folders. So what I plan to start doing is once the email is processed, record it in the database and delete the email from the mailbox instead of just moving it.
So a couple of questions...
1) Is it a good idea to delete the actual email from exchange? Is it better to hold onto it just in case?
2) To keep the size of the fields down I was stripping the HTML out of the emails, is this a bad idea? should I just store the email as it is received?Any other advice/suggestions would be great.
In both cases I think you should hold onto the original emails. Storage is cheap, but if disk space is really an issue look to compression rather than excision to solve it.
Both your of your use cases (historical record and audit) will be better served by storing the complete unabridged email in the database. Once you start tampering with the data, albeit "just" removing formatting, it becomes difficult to prove that you haven't edited it in other, more significant ways. Especially if you have deleted the original email instead of archiving it.
You don't say what business you're in, but the other thing to remember is whether there are any data retention policies active within your organisation or in the wider jurisdiction. Compliance is becoming gnarlier all the time.
I would maintain the messages on the Mailbox on a specific folder as you are doing and probably wouldn't even save anything on a database given you can access the Mailbox from within your application.
The Exchange team over the years has developed several APIs for accessing the Mailbox's contents.
With Exchange Server 2007 and 2010, the recommended API would be Exchange Web Services which can be used from any language/environment that is capable of accessing Web Services.
If you are developing with a .Net language (C#, VB.NET for instance), your best bet would be EWS Managed API.
If you are really going to do something meaningful with the body, you can save the results as named properties (extended properties in EWS parlance) on the message itself.
There are other APIs with corresponding functionality for previous versions of Exchange.
精彩评论