nosql storage for tracking user behavior
I'm trying to develop a system with records user actions on our site so later on we can make some patterns. I'm not sure what data storage should I use, but I consider something NoSQL like because it's easy scal开发者_JAVA百科able. It should be something schemaless, so we can easy change data format if necessary. Also, it should write data pretty fast and often, but reads are done very rare.
Data should be something like this:
userid=1,action=act1,timestamp=1234, additional_info1=something_here userid=2,action=act1,timestamp=324, additional_info2=something_else_here
Upon storage, we want to make some statistics for one user, one action, one additional_info.
Can you give me some hints on what storage should I use?
PS: Out webapp is written in PHP
Based on your specifications - fast, often and secure write, not so fast read, scalability,and key that will be the "representative" of the collection and by which you will fetch the data,I recommend Cassandra DB. Its description is:
Best used: When you write more than you read (logging).
Resources you need:
http://cassandra.apache.org/
Developed by Facebook to take care of the messaging system, but used by other large players also, like Digg, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX.
As far as writing, fastest and the most reliable.
EDIT:
Also another key sentence describing Cassandra:
Writes are faster than reads, so one natural niche is real time data analysis.
And as i understood this niche is more or less the purpose you need it for.
Here you can inform yourself on the details and a good, objective comparison of the NoSQL db mechs -
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
If you would like an easier way out, but at the expense of a less safe writing, MongoDB is also a viable choice.
It has an easier querying system, so basicaly it would be easier for you to search the data.
Resource:
http://www.mongodb.org/
Cheers,
As far as i understand, you need ease of use and dynamic/schema-less. Although the information is not enough but I feel like you need something like Redis or MongoDB. Please note that MongoDB stores JSON documents and queries get complex at times and there maybe some learning-curve involve. On the other hand with Redis you can good to go in no-time. However you should know that you need to think differently than RDBMS. There are no joins and relational stuff for the data analysis part, so you need to understand and design your solution accordingly.
I have explained some different types of NoSQL databases in my blog entry if you need an overview of NoSQL, http://ttltheory.wordpress.com/2011/08/07/next-generation-data-storage/
Can you give me some hints on what storage should I use?
Not really, no. And you seem to have already decided on using a NoSQL DB.
The information you (we?) need to answer this is what information (explicitly) do you want to capture, how you want to analyse it and how you want to present the results.
By all means implement the full solution using a nosql system - but if you've not got your requirements well defined then I'd strongly recommend using a relational database to model the data and produce sample reports.
精彩评论