Recommendation for click/event tracking mechanisms (python, django, celery, mongo etc)
I'm looking into way to track events in a django application (events would generally be clicks tied to a specific unique user id).
These events would essentially contain an event type like "click" and then each click event would be assigned to a unique id (many events can go to one id) and each event would have a data set including items like referrer etc...
I have tried mixpanel, but for now the data api they are offering seems too limiting as I can't seem to find a way to get all of my data out by a unique id (apart from the event itself).
I'm looking into using django-eventracker, but curious about any others thought on the best way to do this. Mongo or CouchDb seem like a great choice here, but the celery/rabbitmq looks really attractiv开发者_如何学编程e with mongo. Pumping these events into the existing applications db seems limiting at this point.
Anyways, this is just a thread to see what others thoughts are on this and how they have implemented something like this...
shoot
I am not familiar with the pre-packaged solutions you mention. Were I to design this from scratch, I'd have a simple JS collecting info on clicks and posting it back to the server via Ajax (using whatever JS framework you're already using), and on the server side I'd simply append that info to a log file for later "offline" processing -- so that would be independent of django or other server-side framework, essentially.
Appending to a log file is a very light-weight action, while DBs for web use are generally way optimized for read-intensive (not write-intensive) operation, so I agree with you that force fitting that info (as it trickes in) into the existing app's DB is unlikely to offer good performance.
You probably want to keep a flexible format for your logs to anticipate future needs or changes. In this sense, the schema-less document-oriented databases are nice. One advantage is that the structure of your data will be close to your application needs for whatever analyses you perform later (so, avoiding some of the inevitable parsing/data munging work).
If you're thinking about using mysql, postgresql or such, then you should look into something like rsyslog for buffering writes and avoiding the performance penalty with heavy logging. (I can't say much about celery and other queueing mechanisms for this type of thing, but they sound promising.)
Mongodb has a some nice features that make it amenable to logging such as capped collections. A summary can be found in this post.
If by click, you mean a click on a link that loads a new page (or performs an AJAX request), then what you aim to do is fairly straightforward. Web servers tend to keep plain-text logs about requests - with information about the user, time/date, referrer, the page requested, etc. You could examine these logs and mine the statistics you need.
On the other hand, if you have a web application where clicks don't necessarily generate server requests, then collecting click information with javascript is your best bet.
精彩评论