How to design a MySql Table for a Tag Cloud?

2022-12-25 22:19 问答作者：

I have articles on my site, and I would like to add tags which would describe each article, but I'm having problems with design mysql table for tags. I have two ideas:

each article would have field "tags", and tags would be in format: "tag1,tag2,tag3"
create other table called tags with fields: tag_name, article_id

So when I want tags for article with ID 1, I would run

SELECT ... FROM tags WHERE `article_id`=1;

But, I would also like to know 3 most similar articles by comparing tags, so if I have article which has tags "php,mysql,erlang", and 5 articles with tags: "php开发者_如何学编程,mysql", "erlang,ruby", "php erlang", "mysql,erlang,javascript", I would choose 1., 3. and 4., since those 3 have most same tags with main article.

Also other question, what is the best way to get 10 "most used tags" ?

Generally, for this kind of many-to-many relationship, there are three tables :

The "article" table
- primary key = id
The "tag" table
- primary key = id
- contains the data of each tag :
  - name, for example
A "tags_articles" table, which acts as a join table, and contains only :
- id_article : foreign key that points to an article
- id_tag : foreign key that points to a tag

This way, there is no duplication of any tag's data : for each tag, there is one, and only one, line in the tag table.

And, for each article, you can have several tags (i.e. several lines in the tags_articles table) ; and, of course, for each tags, you can have several articles.

Getting a list of tags for an article, with this idea, is a matter of an additionnal query, like :

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123

Getting the three "most similar" articles would mean :

select articles that have tags that the first article has
only use those which have the most important number of identical tags

Not tested, but an idea might be something that would look like this :

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

Basically, you :

select the articles ids for each tag that's present on your initial article
- as there's an inner join, if an article in the DB has 2 tags that match the where clause, without the group by clause, there would be two lines for that article
- of course, you don't want to re-select the article you already had -- which means it has to be excluded.
but, as you use group by article.id, there will be only one line per article
- but you'll be able to use count, to find out how many tags each article has in common with the initial one
then, it's only a matter of sorting per number of tags, and getting only the third three lines.

First off, you'll want to use Pascal MARTIN's suggestion about the table design.

As for finding similar articles, here's something to get you started. Given that @article_id is the article you want to find matches for, and @tag1, @tag2, @tag3 are the tags for that article:

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3

yes, but you didn't answer my main question, how to get 3 most similar articles?

Answer: Just look for the same tag ids in the merged table (tags_articles). Gather them and create a pattern.

For example: Article 1 has tags: 1,2 Article 2 has tags: 2,3,4 Article 5 has tags: 6,7,2 Article 7 has tags: 7,1,2,3

If you want the 3 most similar articles for article 1, you have to look for the tags 1,2. You'll find Article 7 is most similar and 2 and 5 have some similarities.

继续阅读：database-design tag-cloud

How to design a MySql Table for a Tag Cloud?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？