开发者

Making a tag cloud - No valid source of tags

So I have a few thousand articles indexed in my database, I want to make a tag cloud just based on how many articles have those tags, so if 1000 articles have the "global" tag it will make that bigger than the "national" tag that is only in 500 or so articles. The concept doesn't seem to puzzle me much, except I don't have a valid source of tags.

See each article has a tags field in the database, in the format of: tag1,tag2,anothertag,againwiththetag

So I was going to do a DISTINCT query to grab all the different tags, but I realized that will just gather up all the different arrangement of tags. So how can I query my database of all the different tags u开发者_Go百科sed, and their counts?

Thanks guys.


See each article has a tags field in the database, in the format of: tag1,tag2,anothertag,againwiththetag

Your database is not normalized. Now you have to pay the price in terms of increased complexity.

You have to scan through all your records, grab the tags field and sum up the totals. Like so:

$counts = array();
$result = mysql_query('SELECT tags FROM articles');
while ($record = mysql_fetch_array($result))
{
  $record_tags = split(',', $record['tags']);
  foreach ($record_tags as $tag)
  {
    ++$counts[$tag];
  }
}

Now $counts is an array that has the tags as keys and the frequency as values.

Of course this puts a terrible stress on the database. A better solution is to normalize the database: add a table article_tags with columns tag and article where article is a foreign key pointing into the articles table. tag is either the literal tag itself or a foreign key into a tags table. The latter is needed if you want to store meta information about tags (description, synonyms, etc).

If you have done that, the code reduces to

$counts = array();
$result = mysql_query(
  'SELECT tag, COUNT(*) AS cnt FROM article_tags GROUP BY tag'
);
while ($record = mysql_fetch_array($result))
{
  $counts[$record['tag']] = $record['cnt'];
}

Also, read up on database normalization, e.g. Wikipedia: Database normalization, in case you haven't already done so.


It should be a fairly simple excercise to change the database schema to normalize the tags.

Instead of a tags column, you should have a tags table:

create table tags
(
  id int,
  name varchar2(50),
  ...
)

Since the relationship is clearly many to many, you'll need a linking table:

create table articles_tags
(
    article_id int,
    tag_id
)

Then it should be fairly easy to convert the current tags into the new type, here is some pseudo code!

1) tags = `select id,tags from articles`
2) for each tag in tags
3) if tag exists in `tags` get tag ID, else create row in `tags` and get tag id
4) using article id and tag id, create entry in articles_tags
5) alter table remove column tags!

This will enable you to create you tag cloud by simply running something like this:

select t.name,count(*)
from tags t
inner join articles_tags at
on at.tag_id = t.id
group by t.name

It will also make it much easier to promote the reuse of tags with an autocomplete feature.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜