开发者

Social Networking and Usage Logging

What sort of data should be logged on a social networking type of site from day 1 so that in the future, useful statistical analysis may be performed? Also, what other tips and tricks have you learned with site logging? Depending on the scale of the site, is it frequently worth it to log to a flat file, and have a job开发者_运维技巧 periodically load that data into a db for site-performance reasons?

I am thinking of server side logging here - not just generic google analytics / piwik type logging. To give a jumpstart to the answer, here are a few no-brainers I've thought of:

  • ip address
  • user identification info, if logged in (userid)
  • HTTP_REFERRER
  • is ajax call (bool)
  • session id (should sessions also be permanently logged separately?)
  • Nth # of views since session began
  • some sort of information to indicate what page user is on (controller being used? Url path?)
  • timestamp


Well, for starters, "generic google analytics / piwik type logging" is actually usually more powerful that server-side log processing - you can set/get various cookies, you can extract lots of information from client available only to Javascript, etc, etc. Even getting a simple visitor_id cookie is much easier in Javascript than in server-side - you'll have to set up some web server module to push session cookies, it will be different from WAA standard 30 minutes, etc, etc.

Generally, when designing variables/fields to log, you'd want to think of what reports/aggregations would you want to get using it. For example:

  • Who's the most active user?
  • What sections of the site / pages / page types in social network are most visited?
  • What are the funnel transitions between various goals you'd like your users to achieve?
  • Where do they come from (especially useful if you're paying for them to come, i.e. using ads) and how do they achieve goals afterwards?
  • Who supplies most useful (longest staying, viewing most of your ads, something else?) users to your site?
  • ...

Contrary to popular opinion "log everything, sort them out later", logging is not a passive, but an active process. You'll most likely end up wanting to push some cookies to the users that would mark their:

  • Session ids
  • Visitor ids
  • Original sources / referrers (i.e. external referrer, search engine / query, ads, etc)
  • Number, frequency of visits, durations of sessions
  • Statuses / achievements of goals
  • etc...

All this stuff requires interaction between server (and/or Javascript collection snippet) and visitor's browser, not just passive logging.


Log each and every request (query string, etc). Log all HTTP variables

'HTTP_ACCEPT', 'HTTP_ACCEPT_CHARSET', 'HTTP_ACCEPT_ENCODING', 'HTTP_ACCEPT_LANGUAGE' 'HTTP_CONNECTION', 'HTTP_HOST', 'HTTP_REFERER', 'HTTP_USER_AGENT'

(perhaps with each request).

As you are interested from day 1, don't worry about information that can be derived from the raw logs. You can do whatever processing you want later.

If resources are a constraint (they should not be in the beginning), you can optimize like hash on the HTTP_USER_AGENT etc.


PHP coders of high traffic sites should look into Scribe. Originally developed by Facebook and now open source, Scribe is a great way to log events in your app for analysis later on. For more information on scribe and other tips, check out this article on logging for analysis purposes.


As you probably already know, log too much rather than too little.

If you log the request line and headers of all requests, you should have a lot of information to dig into at a later point. Eg. that will give you most of the things that you list above (Or they could be deducted from it).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜