开发者

select count(*) vs keep a counter

Assuming indexes are put in place, and absolute-count-accuracy is not necessary (it's okay to be off by one o开发者_高级运维r two), is it okay to use:

Option A

select count(*) 
  from Table 
 where Property = @Property

vs

Option B

update PropertyCounters
   SET PropertyCount = PropertyCount + 1  
 where Property = @Property

then doing:

select PropertyCount 
  from PropertyCounters 
 where Property = @Property

How much performance degradation can I reasonably expect from doing select count(*) as the table grows into thousands/millions of records?


Keeping a separate count column in addition to the real data is a denormalisation. There are reasons why you might need to do it for performance, but you shouldn't go there until you really need to. It makes your code more complicated, with more chance of inconsistencies creeping in.

For the simple case where the query really is just SELECT COUNT(property) FROM table WHERE property=..., there's no reason to denormalise; you can make that fast by adding an index on the property column.


You didn't specify the platform, but since you use T-SQL syntax for @variables I'll venture a SQL Server platform specific answer:

count(*), or strictly speaking would be count_big(*), is an expression that can be used in indexed views, see Designing Indexed Views.

create view vwCounts
with schembinding
as select Property, count_big(*) as Count
from dbo.Table
group by Property;

create unique clustered index cdxCounts on vwCounts(Property);

select Count 
from vwCount with (noexpand)
where Property = @property;

On Enterprise Edition the optimizer will even use the indexed view for your original query:

select count_big(*)
from Table
where Property = @property;

So in the end you get your cake and eat it too: the property is already aggregated and maintained for your for free by the engine. The price is that updates have to maintain the indexed view (they will not recompute the aggregate count though) and the aggregation will create hot spots for contention (locks on separate rows on Table will contend for same count(*) update on the indexed view).


If you say that you do not need absolute accuracy, then Option B is a strange approach. If Option A becomes too heavy (even after adding indexes), you can cache the output of Option A in memory or in another table (your PropertyCounters), and periodically refresh it.


This isn't something that can be answered in general SQL terms. Quite apart from the normal caveats about indices and so on affecting queries, it's also something where there is considerable different between platforms.

I'd bet on better performance on this from SQL Server than Postgres, to the point where I'd consider the latter approach sooner on Postgres and not on SQL Server. However, with a partial index set just right for matching the criteria, I'd bet on Postgres beating out SQL Server. That's just what I'd bet small winnings on though, either way I'd test if I needed to think about it for real.

If you do go for the latter approach, enforce it with a trigger or similar, so that you can't become inaccurate.


On SQL Server, if you don't need absolutely accurate counts, you could also inspect the catalog views. This would be much easier to do - you don't have to keep a count yourself - and it's a lot less taxing on the system. After all, if you need to count all the rows in a table, you need to scan that table, one way or another - no way around that.

With this SQL statement here, you'll get all the tables in your database, and their row counts, as kept by SQL Server:

SELECT 
    t.NAME AS TableName,
    SUM(p.rows) AS RowCounts
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
WHERE 
    t.NAME NOT LIKE 'dt%' AND
    i.OBJECT_ID > 255 AND   
    i.index_id <= 1
GROUP BY 
    t.NAME, i.object_id, i.index_id, i.name 
ORDER BY 
    OBJECT_NAME(i.object_id) 

I couldn't find any documentation on exactly how current those numbers are, typically - but from my own experience, they're usually on the spot (unless you're doing some bulk loading or something - but in that case, you wouldn't want to constantly scan the table to get the exact count, either)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜