Slow making many aggregate queries to a very large SQL Server table

2023-02-26 05:19 问答作者：

I have a custom log/transaction table that tracks my users every action within the web application and i开发者_StackOverflowt currently has millions of records and grows by the minute. In my application I need to implement some of way of precalculating a user's activities/actions in sql to determine whether other features/actions are available to the user within the application. For one example, before a page loads, I need to check if the user viewed a page X number of times.

(SELECT COUNT(*) FROM MyLog WHERE UserID = xxx and PageID = 123)

I am making several similar aggregate queries with joins for checking other conditions and the performance is poor. These checks are occuring on every page request and the application can receive hundreds of requests per minute.

I'm looking for any ideas to improve the application performance through sql and/or application code.

This is a .NET 2.0 app and using SQL Server 2008.

Much thanks in advance!

Easiest way is to store the counts in a table by themselves. Then, when adding records (hopefully through an SP), you can simply increment the affected row in your aggregate table. If you are really worried about the counts getting out of whack, you can put a trigger on the detail table to update the aggregated table, however I don't like triggers as they have very little visibility.

Also, how up to date do these counts need to be? Can this be something that can be stored into a table once a day?

Querying a log table like this may be more trouble then it is worth.

As an alternative I would suggest using something like memcache to store the value as needed. As long as you update the cache on each hit it will much faster the querying a large database table. Memcache has an build in increment operator that handles this kind of thing. This way you only need to query the db on the first visit.

Another alternative is to use a precomputed table, updating it as needed.

Have you indexed MyLog on UserID and PageID? If not, that should give you some huge gains.

Todd this is a tough one because of the number of operations you are performing. Have you checked your indexes on that database?

Here's a stored procedure you can execute to help at least find valid indexes. I can't remember where I found this but it helped me:

CREATE PROCEDURE [dbo].[SQLMissingIndexes]
@DBNAME varchar(100)=NULL
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    SELECT 
        migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) 
        * (migs.user_seeks + migs.user_scans) AS improvement_measure, 
        'CREATE INDEX [missing_index_' 
        + CONVERT (varchar, mig.index_group_handle) 
        + '_' + CONVERT (varchar, mid.index_handle) 
        + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
        + ' ON ' + mid.statement 
        + ' (' + ISNULL (mid.equality_columns,'') 
        + CASE WHEN mid.equality_columns IS NOT NULL 
          AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END 
        + ISNULL (mid.inequality_columns, '')
        + ')' 
        + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
        migs.*, 
        mid.database_id, 
        mid.[object_id]
    FROM 
        sys.dm_db_missing_index_groups mig
    INNER JOIN 
        sys.dm_db_missing_index_group_stats migs 
    ON migs.group_handle = mig.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details mid 
    ON mig.index_handle = mid.index_handle
    WHERE 
        migs.avg_total_user_cost 
        * (migs.avg_user_impact / 100.0) 
        * (migs.user_seeks + migs.user_scans) > 10
    AND 
        (@DBNAME = db_name(mid.database_id) OR @DBNAME IS NULL)
ORDER BY 
        migs.avg_total_user_cost 
        * migs.avg_user_impact 
        * (migs.user_seeks + migs.user_scans) DESC
END

I modified it a bit to accept a db name. If you dont provide a db name it will run and give you information about all databases and give you suggestions on what fields need indexing.

To run it use:

exec DatabaseName.dbo.SQLMissingIndexes 'MyDatabaseName'

I usually put reusable SQL (Sproc) code in a seperate database called DBA then from any database I can say:

exec DBA.dbo.SQLMissingIndexes

As an example.

Edit

Just remembered the source, Bart Duncan. Here is a direct link http://blogs.msdn.com/b/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx

But remember I did modify it to accept a single db name.

We had the same problem, beginning several years ago, moved from SQL Server to OLAP cubes, and when that stopped working recently we moved again, to Hadoop and some other components.

OLTP (Online Transaction Processing) databases, of which SQL Server is one, are not very good at OLAP (Online Analytical Processing). This is what OLAP cubes are for.

OLTP provides good throughput when you're writing and reading many individual rows. It fails, as you just found, when doing many aggregate queries that require scanning many rows. Since SQL Server stores every record as a contiguous block on the disk, scanning many rows means many disk fetches. The cache saves you for a while - so long as your table is small, but when you get to tables with millions of rows the problem becomes evident.

Frankly, OLAP isn't that scalable either, and at some point (tens of millions of new records per day) you're going to have to move to a more distributed solution - either paid (Vertica, Greenplum) or free (HBase, Hypertable).

If neither is an option (e.g. no time or no budget) then for now you can alleviate your pain somewhat by spending more on hardware. You need very fast IO (fast disks, RAID), as as much RAM as you could get.

继续阅读：asp.net performance sql-server

Slow making many aggregate queries to a very large SQL Server table

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Edit

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？