which NOSQL database tool is better to choose for my application?
I am planning to develop some application like connecting with friends of friends of friends. It may look like as Facebook or Twitter but initially i am planning to implement that to learn more about NOSQL databases.
There are number of database tools in NOSQL. I have gone through many database types like document store, key-value store, column type, graph databases. And finally i come up with two database tools which are cassandra & Neo4J. Is it right to choose any one, if not correct me & provide me some your valuable opinions.
One more thing is the language binding which i choose is JAVA.
My question is, Which database tool suits for my application?
Awaiting fo开发者_高级运维r your valuable opinions. Thanks for spending your valuable time.
Tim, you really should have posted your question separately, rather than as an answer to the OP, which it wasn't.
But to answer, first, go read Ben Black's slides at http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency.
Done? Okay, now for the specific questions:
"How would differences in [replica] data-state be reconciled on a subsequent read?"
The highest timestamp wins.
"Do all zones work off the same system clock?"
Timestamps are provided by clients (i.e., your app server). They should be synchronized with e.g. ntpd (which is good practice anyway), but high precision is not required because if ordering matters you should be avoiding conflict either by using unique column names or by using external locking.
For example: if you have a list of users following you in a Twitter clone, you should give each follower its own column and there will be no way to lose data no matter how out of sync the clocks are.
If you have an admin tool for your website and two admins upload a new favicon "simultaneously," one update is going to win and it doesn't really matter which. Here, you do want your clocks synchronized but "within a few ms" is close enough.
If you are managing user registration and you want to allow creating account "jbellis" only if it doesn't already exist, you need a lock manager no matter how closely synchronzied your clocks are.
"Would stale data get returned?"
A node (a better unit to think about than a "zone") will not have data it missed during its downtime until it is sent that data by read repair, hinted handoff, or anti-entropy repair. In the meantime, it will reply to read requests with stale data; if you use a high enough consistencylevel read requests will wait for enough other replies to make sure you always see the most recent version anyway, which may mean not being able to fulfil requests if enough other replicas are down.
Otherwise, a low consistencylevel (e.g. ONE) implicitly means "I understand that the higher availability and lower latency I get with this lower consistencylevel means I'm okay with seeing stale data temporarily after downtime."
I'm not sure I understand all of the implications of the Cassandata consistency model with respect to data-agreement across multiple availability zones.
Given multiple zones, and given that the coordinator node in Cassandra has used a consistency level that does not require all zones to report back, but only a quorum, how would differences in zone data-state be reconciled on a subsequent read?
Do all zones work off the same system clock? Or does each zone have its own clock? If they don't work off the same clock, how are they synchronized so that timestamps can be compared during the "healing" process when differences are reconciled?
Let's say that a zone that does have accurate, up-to-date data is now offline, and a zone that was offline during a previous write (so it didn't get updated and contains stale data) is now back online. Would stale data get returned? Would the coordinator have any way to know the data were stale?
If you don't need to scale in the short term I'd go with Neo4j because it is designed to store networks like the one you described. (If you eventually do need to scale, maybe you can throw Gizzard in front of it or something. Good luck!)
Have you looked on Riak database? It has the same background as Cassandra, but you don't need to care about timestamp synchronization (they involve different method for resolving data status).
My first application was build on a Cassandra database. But I am now trying Riak because it is more suitable. It is not only the difference in keys (keys - values / super column - keys - values) but goes further with the document store feature.
It has a method to create complex queries using MapReduce. Cassandra does have this option using Hadoop, but it sounds difficult.
Further more it uses a well known and defined access protocol in http/s so it's easy to manage the server when you have a lot of traffic.
The only bad point is that is slower than Cassandra. But usually you will read records more than write (and Cassandra is optimised on writes, not reads) so the end result should be ok.
精彩评论