Never losing data techniques
I'm curious about techniques used to build a system where ensuring that no data is lost is of the utmost priority. For a simplistic example, what does a financial institution do to make sure that when money is transferred between accounts, once it is withdrawn from one account it is without a doubt put in the other account. I'm not so much look开发者_高级运维ing for particular techniques like database transactions, but larger, more architecty concepts, like how the data is saved if a server goes down, or a queue runs out of space, or whatever.
If someone could point me to books or articles on I'd be much obliged.
You should read about Automated Teller Machine, Online transaction processing, and others topics about data encryption, also consider use HTTPS if you are thinking on web sites.
The basic technique is removing any single point of failure. Anything that can fail in your setup needs to have a back or multiple backups. From Multiple switches, servers, UPSs, harddrives, etc... Databases are constantly being replicated, and data is backed up and stored off site in case of a fire or other disaster which could comprimise the building.
It can all really boil down to having the same data in two places; from down to a the code which is holding a cache prior to a commitment of data, all the way up to server redundancy.
The only way to make sure you don't lose something is have multiple copies of it.
in the case of the bank example, each bank would keep a record for every transaction stating what how much and to where and from where and how much and their time order
so that later if there is a problem you compare the two transaction logs if they don't match you can identify the missing transactions
this also covers the problem that one bank can't trust another to keep records for it
as they cross check this is almost a distributed transaction protocol
You might want to read up on XA or X/Open transactions which can co-ordinate multiple systems including databases, queues, and more into ACID DB-like transactions.
I've not worked with it but I've heard it can be expensive latency-wise and computationally. But then again how much is your data integrity worth?
http://en.wikipedia.org/wiki/X/Open_XA
As you've alluded to, there are various mechanisms (like transactions) for ensuring the software based "handshake" is reliable and completes successfully.
Architectureally - yes having two copies of stuff gives you redundencey which helps not losing stuff. beyond that:
- Clear Processes: people need to know exactly where information is going - both in sunny day senarios and when the brown stuff hits the fan. Having the data but not being able to find it, or recognise it is just as bad as lossing it. The clearer (and well documented) your processes are the better.
- Consistency: automated is obviously better that random human error.
- To specifically answer your question - but the above points should be echoed in an architecture and design that was clear, and which clearly seperated concerns.
- Reduce points of failure as much as possible.
- Focus attention on higher risk areas.
- Use proven techniques (I guess that's what you're actually asking for).
- Keep things as simple as possible.
I worked on a solution architecture for an off-the-shelf document management system a while back; no loss of data was the big driver. The system was rolled out nationally, so multi-site in terms of both 'regional' caches for servicing local users, and actual 'data centers'. Some points of interest:
- All components (where possible) were deployed onto virtual boxes, which were back-ed to a SAN, so iun the event of a physcial host going down we could restore service faster. In terms of data loss it means that users are more likely to be able to use the protected system than storing stuff locally if the system was down.
- Also, the SAN was seen as being more safer than local disks.
- The above was part of the existing set-up, so nothing new for Ops to learn.
- Failover site, with replication. This wasn't real-time, and was augmented by the transactional logs on the databases.
I guess none of this is heavily software centered, but I do think that all the good software architecture / design principles "we" use helped guide my thinking.
精彩评论