Looking for distributed/scalable database solution where all nodes are read/write? Not MongoDB? [closed]
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
开发者_如何学JAVA Improve this questionI'm looking to implement a database that can be widely distributed geographically and such that each node can be read/write with eventual consistency to all other nodes. Where should I be looking?
I thought that MongoDB seemed like a nice option for other reasons until I came to this concern. Apparently all MongoDB nodes are readable, but only a master is writable? Is there anyway to get around this? I can't allow a single point failure for writing to the database.
I just finished my review of several similar databases. I ended up with Mongo for different reasons. Riak and Cassandra are both implementations of Amazon's Dynamo, which could each do a good job of that. At the Riak site, they have good comparisons of Riak and a few other databases. For your specific question, I think both Riak and Cassandra handle writes on any node with a vector clock for Riak's commits, and a timestamp for Cassandra's to handle conflicts.
Other than that, you have a few other choices that may make sense:
- HBase can do very big, can do concurrent writes on various nodes. It's design is good for many attributes on each document/record.
- CouchDB has good multiple-write conflict support but with a simpler document space.
- I read a good argument for schema-less MySQL here, with a nod to the still-green Drizzle
I'm not sure that's a complete answer. My search took several weeks and about 50 pages of notes, but if large, distributed, and safe writes are the big criteria, that should move you along.
If your concern is about a single point of failure: MongoDB uses replicasets for distributing reads and sharding for distributing writes. To achieve what you are looking for you can shard your system with each shard being a replica set. If your primary in a shard dies then a new primary is automatically elected and hence is not a single point of failure. Note: MongoDB does not support multi-master replication
Depends on how you want to distribute your writes.
Sharding: If you are looking to distribute writes on a key, MongoDB has a great auto-sharding feature. For redundancy, you would create multiple replica (master-slave) pairs and then assign each of them a key range through a central service (mongos). Reads would be distributed statically by key range.
Multi-Master:
If you're system is small enough (GB, not TB), CouchDB has one of the more sophisticated merge-replication schemes and is built for fast, reliable recover in the event of node failure. With CouchDB, every node has a complete copy of the data, and all nodes in a cluster can be both writable and readable.
If you are pulling in millions of rows per hour, Cassandra uses a peer-based replication scheme that will allow you to scale writes far beyond CouchDB if you're willing to give a little on the read performance.
HBase also scales writes and reads, but is better-suited to a batch-oriented write function (loading log files), as it sits on HDFS and writes need to be close to the minimum block size (64MB, 128MB...) before a write can be committed to disk.
Hope this helps.
I'm a fan of couchdb
Sorry, I got cut off before I could expand on this.
1) Firstly couch is easily geographically distributed - you talk to it over http which is great for distributed projects.
2) Couch has replication built in.
Better yet, you may find that bigcouch is even more suitable as it is specifically designed with clustering in mind.
I spent several weeks evaluating Mongo / Cassandra / Couch et al and decided that on balance, for a wide range of applications, Couch is well suited.
I suppose you should also be looking at Amazon Simple DB. When it comes to distributed eventually consistent databases, it certainly fits the bill. I've been using it on a number of projects for a couple of years and it does what it says on the tin. My only concern is that you are basically putting all your data into a third party's black box ... but it certainly works, scales and ticks all of your boxes.
Hope that helps flesh things out a bit.
You can use a product like CloudTran to handle very fast distributed transactions across common databases like MySQL, Oracle, SQL Server, etc.
This is one of the design goals of NuoDB, and the product does this today.
You can read (QUERY), write (INSERT, UPDATE, DELETE), or do anything else transactionally across multiple datacenters as though the database is in a single location. NuoDB is truly consistent, not eventually consistent. It guarantees ACID transactions using optimistic asynchronous messaging and distributed versioning. And NuoDB has rich support for standard SQL.
精彩评论