What NoSQL solution is best to store Apache error_log and access_log? Cassandra or MongoDB?
We have developed PaaS solution for PHP. As part of that we offer developers to see Apache error_log and access_log files through our API.
Currently we write the logs into files on disk seperated per deployment (vhost).
Since this doesn't scale too well with a higher number of nodes and deployments, even though files are on distributed filesystem (GlusterFS), we would like开发者_开发问答 to switch to something better.
Especially for billing and statistical reasons we would prefer not to parse log files every time.
As MongoDBs copped collections look awesome for logging we wanted to go with that. But turns out they don't seem to work with auto sharding which kind of spoils the point for us since we expect much more writes then reads.
The other option was Cassandra which I like for it's every node is equal approach, but they don't have something like capped collections.
Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.
So what I'd want to know is has anybody used one of the two systems for logging before? What are your experiences, can you give me some tips? Or are there other solutions that fit our needs better?
Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.
Honestly, we're going through this test right now with some serious log data. (and by right now, I mean, a few of us were up late last night running these tests).
To me, here are the two distinguishing feature: ease of use and proven scaling.
Ease of use
- MongoDB was easy. In a couple of hours I went from blank computer to an active Mongo instance with imported data from MySQL and a few completed map-reduces.
- In the same period of time, team Cassandra sat around re-compiling Java files trying to get the Hadoop configured to run over an existing Cassandra implementation so that they could even run map-reduces.
Proven Scaling
- MongoDB sharding is still in beta. It's slated for launch in the next few weeks. That's pretty tight.
- Cassandra sharding is proven on some very large instances.
So I think the answer is really going to be specific to your personal tastes. I honestly think that Cassandra may be a more stable & proven product, but I also know from experience that the learning and setup curve is a lot steeper. So it might be worth trying a little bit of both.
You can check out this article from Cloudkick
if you are considering using Cassandra
: 4 Months with Cassandra, a love story.
They are using Cassandra
to store different metrics for their system, which is somewhat similar to storing log files.
EDIT:
If you haven't yet decided what to use, here's a great solution using MongoDB
as a backend:
Graylog2 is an open source syslog implementation that stores your logs in MongoDB. It consists of a server written in Java that accepts your syslog messages via TCP or UDP and stores it in the database. The second part is a Ruby on Rails web interface that allows you to view the log messages.
精彩评论