Distributed caching systems and how they distribute data
I'm looking for information on things like ehcache and开发者_运维知识库 other alternatives to memcached for a project that will likely involve 3-4 webservers and something like 2-10 million distributed objects that need to be available to all servers.
Specifically, I'm trying to understand how other systems distribute data, whether or not memcached is unique in distributing data among multiple caches, or other caches perform similarly (that is, the property that a given key may exist on any of N servers, and the clients don't care, as opposed to updates on a single server propagating to other caches that essentially act as copies).
For example, in looking at documentation for things like ehcache it's not clear to me if by "distributed" they mean a strategy similar to memcached or something more like "replicated/synchronized".
Edit: although the refs on distributed computing are useful, I'm more interested in how specific implementations behave. e.g. will I be paying for synchronization overhead in some systems?
You are not extremely precise in your question, although I might see where you want to go this is a pretty large field in itself.
You might want to start here: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
Also having a look at Dynamo, BigTable, and all the theoritical questions associated with this (CAP theorem and the presentation by Werner Vogels on this that you can find on infoq).
You have more and more information about this thanks to the multiple videos found about the NoSQL meetups.
Hope it helps,
Edit: about the synchronization overheads, it really depends on the system. every system has specific requirements, Dynamo for example aims at a high availability system that might not be always fully consistent (eventual consistency), so it is meant (by design and because of its requirements) to be a distributed systems in which every write must be accepted and fast. Other systems might behave differently,
I suspect you are after a discussion on consistency across "distributed data". This topic is vast but a good reference on the trade-offs is available here.
In other words, it pretty much depends on your requirements (which aren't very detailed here). If I have misunderstood your question, you can safely disregard my contribution ;-)
The feature or property you are probably looking for is a "shared nothing" architecture. Memcached is an example, e. g. there is no single point of failure, no synchronization or any other traffic between nodes, nodes don't even know each other.
So if this is what you want and you're evaluating a product/project, look for the "shared nothing" term. If it is not mentioned on the first screen, it probably is not a shared nothing architecture ;)
精彩评论