What's a very efficient way to have a bulk mailer app keep accurate track of its progress?
We have a console application (currently .NET) that sends out mail in bulk to people that have subscribed to a mailing list. We are re-implementing this application because of limitations that it currently has.
One of the features it must have, is that it can resume after an unexpected interrupt of its operation. This means that every time it succesfully sends an e-mail, it has to keep track in a way that it can pick back up right where it left off. It'll get the information it needs (basically the list of recipients which are identified using a numeric id) from a different server, which has the database containing this information.
Our setup is simple: we have one Windows-based web/database server that contains the recipients, and we have the SMTP-server running Debian.
We have come up with several options that would solve this:
- Send a signal back to the database after every send operation
- Keep track in a small file by writing only the last id of the recipient to this file (overwriting its contents with each write) after every send operation.
- Keep track in a database that runs on the host machine (mysql, postgresql, sqlite, etc)
The constraints are that the application is supposed to send mails fast. As for amounts of mails it has to send, it'll vary between several hundreds to several tens of thousands per batch, and it could be several batches per day, too. Overall it's usally between 1000 and 50.000 mails on a day, but this will grow. Also, it must be able to resume accurately so I can't wait until, say, 50 mails are sent, and then write the progress to a file or da开发者_JAVA技巧tabase or so.
This what I came up with so far with regards to the above solutions:
- We currently have our application use this solution. But the application will run on a different server than the database server (they aren't in the same network either, but the application will run on the mail server, as opposed to the current situation) so I can't imagine that being the most efficient solution.
- This could be very fast, but wouldn't it strain the hard drive to the point where its lifespan could be severely shortened? (This server is an older Opteron, I believe, it may pre-date SATA, but if so, not by much.)
- This may be very fast, and efficient, but would it be necessary to setup a database for the purpose of only storing 2 numbers (id of the batch, and id of the last recipient within that batch)? Would overhead maybe slow this down?
Apart from the above solutions, are there other options I haven't yet considered, to keep track without really slowing the application down? Are my assumptions accurate?
1000-50000 emails per day doesn't seem like an awful lot to me, so I don't think you will have to worry too much about capacity at the moment. Where I work we have a single instance of a Windows service which reads 100 rows from a database (where our email data is stored) at a time, processes each row in succession and updates the database to mark the email as sent. I'm not saying this is a good design (it isn't) but we regularly send more than 50k emails per day using this setup.
If you have a real need to scale - i.e. one you can quantify in terms of growth over the next 3, 6, 12 months and which shows significant growth - then I'd put real effort into scalability now. If you don't, I'd focus on keeping it simple and lightweight.
Why not mark each email message as "in process" while it's being processed by your bulk email application, and then mark it as "sent" (both in the db) when the work is done? This approach could allow you to multi-thread your application as scale demands as well (if you design for that, of course).
精彩评论