ASP.NET MVC 3 - Web Application - Efficiently Aggregate Data
I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:
- How often is a specific product viewed
- Which search phrases typically return specific products in their result list
- How often (for specific products) does a search result convert to a view
I would like to aggregate this data and break it down:
- By product
- By product by week
- etc.
I'm wondering what are the cleanest and most efficient strategies for aggregating 开发者_JAVA技巧the data. I can think of a couple but I'm sure there are many more:
- Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
- Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.
My concerns are:
- I would like to limit the number of moving parts.
- I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
- In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)
Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...
Thanks
JP
I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.
Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.
As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.
Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.
I agree that you should create a separate database for your statistics, it will reduce the impact on your database.
You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.
Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.
I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in SQL server for Bi.
精彩评论