T-SQL Tag Database Architecture Design?
Scenario
I am building a database that contains a series of different tables. These consist of a COMMENTS table, a BLOGS table & an ARTICLES table. I want to be able to add new items to each table, and tag them with between 0 and 5 tags to help the user search for particular information that is relevant more easily.
Initial thoughts for architecture
My first thoughts were to have a centralised table of TAGS. This table would list all of the available tags using a TagID field & a TagName field. Since each item can have many tags and each tag can have many items, I would need a MANY-TO-MANY relationship between each item table and the TAGS table.
For Example:
Many COMMENTS can have many TAGS. Many TAGS can have many COMMENTS.
Many ARTICLES can have many TAGS. Many TAGS can have many ARTICLES.
etc.....
Current Understanding
From previous experience I understand that a way of implementing this structure in T-SQL is to have an ajoining table between the COMMENTS table and the TAG table. This ajoining 开发者_开发知识库table would contain the CommentID & the TagID, as well as its own unique CommentTagID. This structure would also apply to all other items.
Questions
Firstly is this the right way to go about implementing such a database architecture? If not, what other methods would be feasible? Since the database will eventually contain a lot of information, I need to ensure that it is scalable. Is this a scalable implementation? If I had lots of these tables would this architecture make CRUD operations very slow? Should I use GUIDs or Incrementing INTs for the ID fields?
Help & suggestions would be appreciated greatly.
Thankyou.
You may also want to look at WordPress schema and database description to see how others are solving a similar problem.
Keeping a centralized table of tags is a good idea if you will ever need to do one of the following:
- Build a complete list of all tags (that is mixing blog tags, comment tags and article tags)
- Update the tags so that they get updated everywhere: so that when you change
sqlserver
tosql-server
, it gets changed anywhere: in blogs, articles and comments.
- Update the tags so that they get updated everywhere: so that when you change
Option 1
is very useful to build the tag clouds so I'd recommend to build a table of tags and reference it from your tables.
If you won't ever need to update the tags as described in the option 2, you don't ever need surrogate key for them.
You will most probably need a UNIQUE
constraint on them anyway and there is no point not to make it a PRIMARY KEY
, if you are not going to update them.
This will also save you lots of joins: you don't need to join with the tags table to show the tags.
GUIDs
are more simple to manage, but theу make the indexes and link tables quite large in size.
You can assign a numerical identifier to each table and link like this:
tTag (tag VARCHAR(30) NOT NULL PRIMARY KEY)
tTaggable (type INT NOT NULL, id INT NOT NULL, PRIMARY KEY (type, id))
tTagLink (
tag VARCHAR(30) NOT NULL FOREIGN KEY REFERENCES tTag,
type INT NOT NULL, id INT NOT NULL,
PRIMARY KEY (tag, type, id),
FOREIGN KEY (type, id) REFERENCES tTaggable
)
tBlog (
id INT NOT NULL PRIMARY KEY,
type INT NOT NULL, CHECK(type = 1),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tArticle (
id INT NOT NULL,
blog INT NOT NULL FOREIGN KEY REFERENCES tBlog,
type INT NOT NULL, CHECK(type = 2),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tComment (
id INT NOT NULL PRIMARY KEY,
article INT NOT NULL FOREIGN KEY REFERENCES tArticle,
type INT NOT NULL, CHECK(type = 3),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
Note that if you want to delete a blog, an article or a comment, you should delete from tTaggable
as well.
This way, tTaggable
is only used to ensure the referential integrity. To query all tags for an article, you just issue this query:
SELECT tag
FROM tTagLink
WHERE type = 2
AND id = 1234567
, so you get all tags by querying a single table, without any joins.
usually many-to-many relationship implemented exactly as you describe it.
Auto-incrementing IDs it is good idea since it guarantee that they will be unique.
And you can use guids if you want to tag comments and articles with the same tag(instead of 6 tables you need just 5). But searching with guids may be more slow.
精彩评论