Counting the number of occurances of something in the database
For my website, I want to make something that works a bit like the tags on Stackoverflow - so some fields will have an autocompleter, and the autocompleter will display the number of times that other users have selected each suggested value. I suppose I'd have a database structure like this:
Articles
ArticleID
Content
TagId
Tags
TagId
TagName
Occurances
With the idea being that Occurances represents the number of times each TagId is referenced from the Articles
table.
What is the best way to implement this? I could add/subtract from the occurances
column on each of the stored procedures that update the article
table开发者_如何学运维, but I might miss one, and anyway, there is are some difficulties with this if a user removes a tag from something (as its easy to add 1 to the field for the newly added tag, but harder to work out which tag is being replaced.)
There is lots I don't understand about sql-server. Is there a more robust way of counting occurances like this, that the database system will deal with itself? It would be ok if the data was cached once a day or something.
To be able to have more than one tag attached to an article, you will have to add another table that connects the article table to the tag table. It's called a 'many to many' relation.
article
article_id
content
article_tag
article_id
tag_id
tag
tag_id
tagname
Doing like this, article 1
can be attached to tag 2
, and the next row can be 1
and 3
and so on, so one article points to many tags. To count a certain tag, you join the Article_Tag
and Tag
tables, and and count the rows in Article_Tag
where Tag.tagname = 'mysql'
, for examle.
You can create an indexes view that aggregates all the counts you need and is automatically maintained:
create view TagCounts
with schemabinding
as select TagId, count_big(*) as Occurances
from dbo.ArticleTags
group by TagId;
go
create unique clustered index cdxTagCounts on TagCounts (TagId);
go
Now the TagCounts.Occurances
field is automatically maintained by SQL Server whenever you insert/delete/update the Articles
table. You can query it like:
select Occurances from dbo.TagCounts with (noexpand) where TagId = ...;
And you can cache the result with LinqToCache, as such a query matches the restrictions of Query Notifications.
The trade off of using a pre-aggregated indexed view is scalability: as update of any article updates the count of Occurances for the tags of the article, an exclusive lock is required to update this count. Which implies that only one transaction can use a TagId at any moment. Depending on your traffic and on other elements of your design this restriction may or may not be acceptable.
The other alternative is a table of counts. Front ends (your ASP.Net farm) read this counts and then they update the in-memory count for each operation, keeping track of the delta from the counts in the table. Periodically the front ends merge their deltas into the table (eg. every 5 minutes) and refresh the in-memory table. This way front ends see a stale version of the truth, but an user sees immediate feedback of its actions: because of session stickiness his HTTP requests are processed by the same front end, and thus he see immediately his own article updates triggering modifications to the tag counts. User though do no immediately see the updates from other users that are load-balanced to another front end. Because a crash of the front end (or a process recycle...) will loose the deltas kept so far, the count table will drift in time away from the truth and would have to be periodically updated to the true count in the database.
If you which even more accuracy (all users see the true count immediately) then you can do something based on fast in-memory key value stores, which would be basically the same as my first proposal but with much higher throughput/lower latency, perhaps something based on memcached + redis. I'm not acquainted with SO architecture, but I believe they may be doing something similar.
You could use this query to get the number of occurances by tag:
SELECT Tags.TagId, COUNT(Articles.TagId) as Occurances
FROM Articles
JOIN Tags ON Articles.TagId
GROUP BY Tags.TagId
It could be used in a view or stored procedure, and you can set up your website's cache to requery it as often as required.
If you are using a relational database, the correct way to handle this problem is to NOT store the occurrences on the table itself, but rather dynamically query the number of occurrences on the articles table.
If you don't do it this way, you're stuck coding update queries every time you add/delete a row...generally not nice. If you query dynamically, you won't have an occurrences column in the table, but rather will get that information in your eg. presentation/model layer code.
Use:
SELECT COUNT(*) FROM ARTICLES WHERE TagId = 'xxx' ;
This line is part of iterating code.
精彩评论