开发者

What are good NoSQL and non-relational database solutions for audit/logging database

What would be suitable database for following? I am especially interested about your experiences with non-relational NoSQL systems. Are they any good for this kind of usage, which system you have used and would recommend, or should I go with normal relational database (DB2)?

I need to gather audit trail/logging type information from bunch of sources to a centralized server where I could generate reports efficiently and examine what is happening in the system.

Typically a audit/logging event would consist always of some mandatory fields, for example

  • globally unique id (some how generated by program that generated this event)
  • timestamp
  • event type (i.e. user logged in, error happened etc)
  • some information about source (server1, server2)

Additionally the event could contain 0-N key-value pairs, where value might be up to few kilobytes of text.

  • It must run on Linux server
  • It should work with high amount of data (100GB for example)
  • it should support some kind of efficient full text search
  • It should allow concurrent reading and writing
  • It should be flexible to add new event types and add/remove key-value pairs to new events. Flexible=no changes should be required to database schema, application generating the events can just add new event types/new fields as needed.
  • it s开发者_JAVA百科hould be efficient to make queries against database. For reporting and exploring what happened. For example:
    • How many events with type=X occurred in some time period.
    • Get all events where field A has value Y.
    • Get all events with type X and field A has value 1 and field B is not 2 and event occurred in last 24h


The two I've seen used successfully are MongoDB and Cassandra.


should I go with normal relational database (DB2)?

Yes, you should! If you just want to store stuff and scan it, you might as well write to a file. Very fast, no overhead! But the minute you want to summarize data over time (last 24h, or between time t and t+1), the more you care about the data as something other than lines of text, no question a proper RDBMS is your friend.


We used Redis to do all our centralized logging for all our app servers at mflow.com. It is very fast, which based on these benchmarks it does about 110000 SETs per second, about 81000 GETs per second. It has a VM implementation (if your dataset exceeds available memory) which swaps out un-frequented values out to disk.

It's an advanced data-structures server that can store any binary-safe data with native support for strings, lists, sets, sorted sets and hashes. Based on discussions on the mailing list it is heavily used by a lot of people to store analytics.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜