Slow making many aggregate queries to a very large SQL Server table
I have a custom log/transaction table that tracks my users every action within the web application and i开发者_StackOverflowt currently has millions of records and grows by the minute. In my application I need to implement some of way of precalculating a user's activities/actions in sql to determine whether other features/actions are available to the user within the application. For one example, before a page loads, I need to check if the user viewed a page X number of times.
(SELECT COUNT(*) FROM MyLog WHERE UserID = xxx and PageID = 123)
I am making several similar aggregate queries with joins for checking other conditions and the performance is poor. These checks are occuring on every page request and the application can receive hundreds of requests per minute.
I'm looking for any ideas to improve the application performance through sql and/or application code.
This is a .NET 2.0 app and using SQL Server 2008.
Much thanks in advance!
Easiest way is to store the counts in a table by themselves. Then, when adding records (hopefully through an SP), you can simply increment the affected row in your aggregate table. If you are really worried about the counts getting out of whack, you can put a trigger on the detail table to update the aggregated table, however I don't like triggers as they have very little visibility.
Also, how up to date do these counts need to be? Can this be something that can be stored into a table once a day?
Querying a log table like this may be more trouble then it is worth.
As an alternative I would suggest using something like memcache to store the value as needed. As long as you update the cache on each hit it will much faster the querying a large database table. Memcache has an build in increment operator that handles this kind of thing. This way you only need to query the db on the first visit.
Another alternative is to use a precomputed table, updating it as needed.
Have you indexed MyLog on UserID and PageID? If not, that should give you some huge gains.
Todd this is a tough one because of the number of operations you are performing. Have you checked your indexes on that database?
Here's a stored procedure you can execute to help at least find valid indexes. I can't remember where I found this but it helped me:
CREATE PROCEDURE [dbo].[SQLMissingIndexes]
@DBNAME varchar(100)=NULL
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0)
* (migs.user_seeks + migs.user_scans) AS improvement_measure,
'CREATE INDEX [missing_index_'
+ CONVERT (varchar, mig.index_group_handle)
+ '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL
AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*,
mid.database_id,
mid.[object_id]
FROM
sys.dm_db_missing_index_groups mig
INNER JOIN
sys.dm_db_missing_index_group_stats migs
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid
ON mig.index_handle = mid.index_handle
WHERE
migs.avg_total_user_cost
* (migs.avg_user_impact / 100.0)
* (migs.user_seeks + migs.user_scans) > 10
AND
(@DBNAME = db_name(mid.database_id) OR @DBNAME IS NULL)
ORDER BY
migs.avg_total_user_cost
* migs.avg_user_impact
* (migs.user_seeks + migs.user_scans) DESC
END
I modified it a bit to accept a db name. If you dont provide a db name it will run and give you information about all databases and give you suggestions on what fields need indexing.
To run it use:
exec DatabaseName.dbo.SQLMissingIndexes 'MyDatabaseName'
I usually put reusable SQL (Sproc) code in a seperate database called DBA
then from any database I can say:
exec DBA.dbo.SQLMissingIndexes
As an example.
Edit
Just remembered the source, Bart Duncan. Here is a direct link http://blogs.msdn.com/b/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
But remember I did modify it to accept a single db name.
We had the same problem, beginning several years ago, moved from SQL Server to OLAP cubes, and when that stopped working recently we moved again, to Hadoop and some other components.
OLTP (Online Transaction Processing) databases, of which SQL Server is one, are not very good at OLAP (Online Analytical Processing). This is what OLAP cubes are for.
OLTP provides good throughput when you're writing and reading many individual rows. It fails, as you just found, when doing many aggregate queries that require scanning many rows. Since SQL Server stores every record as a contiguous block on the disk, scanning many rows means many disk fetches. The cache saves you for a while - so long as your table is small, but when you get to tables with millions of rows the problem becomes evident.
Frankly, OLAP isn't that scalable either, and at some point (tens of millions of new records per day) you're going to have to move to a more distributed solution - either paid (Vertica, Greenplum) or free (HBase, Hypertable).
If neither is an option (e.g. no time or no budget) then for now you can alleviate your pain somewhat by spending more on hardware. You need very fast IO (fast disks, RAID), as as much RAM as you could get.
精彩评论