Which data storage system offers the best update / upsert performance?
I am looking for a data storage system (NoSql preferred) that offers the best update / upsert performance. This is by far the most important aspect. Also, it's im开发者_StackOverflowportant that the size of the updated records will grow quickly. I have been using MongoDB, but I cannot get the update performance to the levels required.
Can anyone recommend anything?
Before jumping to other DB solutions specifically, what was the bottleneck on MongoDB? Were you maxing out the disk IO? Did you hammer the server with lots of inputs threads? What type of numbers did you achieve? I've seen server-class hardware push tens of thousands of inserts / second so what do you need.
Obviously, there are lots of other DB solutions that serve as Key-Value DBs. Riak, Redis, Membase, CouchDB, HBase, just to name a few. But like MongoDB, none of these DBs are magic and they still obey the basic laws of computer physics.
So to get a really good answer to your question we'll need:
- The server configuration
- The basic tests you ran
- The performance you achieved
- Basic server monitoring data during the test
The other databases I mentioned may perform slightly better than MongoDB, but they won't perform 100 times better, so we really need to qualify what you're looking for.
Cassandra provides an eventual consistency model (though, this is a bit of a misnomer as it can be tuned to be very consistent) which allows a very nice insert / update performance. I don't have any solid benchmarks to give you but most of what I've seen in my own experience and what I've read online, it looks like Cassandra gives better insert / update performance than HBase.
I would take a look at both and try them out with some sample data to see which one works for you. I'm a huge fan of Cassandra but wished their super columns were more useful.
Since I cannot comment on other posts yet, I'll post this as an answer instead: acquiring a faster HDD is rather recommended, as per what Remon said, 7200rpm HDDs aren't really expensive or anything, of course if you want optimal performance for that an SSD would be great.
As per your Question, I've only worked with mongoDB in the NoSQL scene, and given the fact that I'm working with low-end hardware I see very good performance when it comes to updates/upserts from it.
However, I'm only working with a couple of hundred updates per second here, I don't know about performance at a much higher level of volume right now.
Additionally, you also didn't specify the amount of data being upserted on the database, nor the frequency that the operation is done, nor the "predicted" amount.
As others have said, finding where and what the bottlenecks(1) are helps more than a broad sweeping statement.
However, in my experience, on the basis of an extremely small and unscientific experiment, Cassandra does seem to load faster (I never got idle time down to zero when I was trying it out).
This is just an observation -- I would NOT take this as any recommendation
To make an informed choice you'll need to weigh up
- the surrounding software ecosystem,
- the functional and non
functional requirements eg
- the benefits of a document oriented database over a key-value store,
- need for a grid file system.... .
(1) remember - you never eliminate bottlenecks -- you just move them elsewhere :-( -- as soon as you solve one issue you'll find another slowest part of your system -- with luck it's in a place where it doesn't adversely affect you.
精彩评论