Logging/tracking in PHP: Scribe, Chukwa, log4php?
This is probably a pretty high-level question that requires a lot of explaining, but I'm in need of a lot of explaining.
Basically I'm developing a PHP application that requires a lot of logging and tracking. Tracking clicks, interactions, performance, etc. etc. Anything under the sun. Facebook's Scribe and Yahoo's Chukwa are both great implementations of this. I know little abo开发者_如何学运维ut log4php.
What I want is a high-level overview of how this kind of logging works, specifically in conjunction with a PHP application. You can stop at the point where the log gets processed; I already know that I want to use Hadoop/Hive for processing and storage.
I'd also like some fairly low-level looks at what happens within the application itself. For example, how does one take the behavior of a click and send that to the logger? I'd appreciate any reading that can help get me started, as well.
You can buy/get the tools to do this for you or build in-house.
buy/get:
1 - Tag your pages with Google/Yahoo analytics - This will track pageviews, page flow performance, SEO ranking for keywords, etc.
2 - For tracking and logging user behavior, which include clicks, interactions and performance. I found nothing better than ClickTale - http://www.clicktale.com/default_e.aspx - It video records user sessions and puts these "log files" in a server.
in-house: 1 - Creating hidden fields in your forms that submits to a logging database also works. You specify unique IDs to forms and keep track of it's actions during submits.
I'm sure there's lots more, but these are the basics. These are not PHP specific though.
HTH
EDIT #1 :
This may be beyond the scope of your question, but tracking doesn't necessarily mean data that goes in-house. An example would be adding a "like it" or "digg it" button to articles or pages. This will "log" popularity for you. You can go to facebook or digg.com to see progress of your site. it'll also help with SEO. basically, it's a tracking system. And it's easy to use. there are PHP snippets out there that you can copy and paste to your code. If you have WordPress, there is a plugin - just look for "digg", "like it" in the plugin search section.
Going back to Google Analytics, if you want to go beyond tracking clicks, go ahead and make goals/funnels. It'll track user behavior, and answer questions such as "What were my most valuable keywords?" "where are all my users dropping off?" "what is the bounce rate for each page?" "what are the top 3 entry points to my site and from what traffic medium?" these are question SEO/SEM managers are most concerned about. and it's definitely good to track and understand.
ClickTale starts where Google Analytics ends. GA will describe user behavior in the page level, but not in the field level. ClickTale, which has heat maps, will answer these questions "I know this page has a high bounce rate, but why? which field is a problem field for my customers?" "At what area of the page do users spend most of their time in?" "how do i prove to the graphics guys that a particular section needs to be redesigned?".
EDIT #2
For high traffic sites, you will need to scale your logging DB. It really helps when it comes to reporting. What I suggest is a 3-tier database reporting structure. tier 1 = last 7 days, tier 2 = last 6 months, tier = everything. You can modify these according to the business. The point being, data moves from one tier to another. keeping fresh data readily available. You want to generate reports asap. A a single huge DB just doesn't scale.
You can monitor user clicks by logging the path the user is taking, referrer --> new uri, assuming both are verbose and descriptive enough. For example, if a user clicks on one of his friends you should log the uris:
Referrer: /users/41251
Target: /users/66257
storing them properly for easy querying and reporting. Here a direct click like that would assume the target is in the referrer's page, so is a friend. If you have more complicated scenarios be sure to describe them with distinct uris, eg: /users/suggestion/14152
for a suggested connection.
Add to that timestamps and you have a very rough estimate of how long they stayed on each page, although users tend to lose focus, switch tabs/applications and come back, etc. Google Analytics, for one, does this well.
For a summary of where users click most on your site using heatmaps I like the free (GPL) Clickheat.
Check out Splunk
On the frontend where you're doing the logging from, here is some sample PHP code that you might find useful:
http://www.alphadevx.com/a/85-Logging-Messages-to-Scribe-from-PHP
In terms of the architecture, you have a lot of flexibility with Scribe. I would recommend having a local Scribe instance running on each application node, and having your application log locally to localhost. These local Scribe instances can in turn be configured to log to a central Scribe server when it is not too busy, otherwise they will continue to queue up messages locally. You actually consume your logs on the central server where they are aggregated by category.
I'm a big fan of Scribe, and I think it's designed well is so far as it's got a very small memory and processor footprint, and it is quite easy to configure (although murder to install due to the dependencies!). It just lacks documentation.
精彩评论