How should I design the tables to store tags in a database?
"question_id": 58640
"tags": ["polls", "fun", "quotes"]
"title": "Great programming quotes"
"question_id": 184618
"tags": ["polls", "fun", "comment"]
"title": "What is the best comment in source code you have ever encountered?"
"question_id": 3734102
"tags": ["c++", "linux", "exit-code"]
"title": "Why cant' I return bigger values from main function ?"
"question_id": 2349378
"tags": ["communication", "terminology", "vocabulary"]
"title": "New programming jargon you coined?"
"question_id": 3723817
"tags": ["open-source", "project-management", "failure", "fail"]
"title": "How to make an open source project fail"
"question_id": 3699150
"tags": ["testing", "interview-questions", "job-interview"]
"title": "Interview question please help"
This is just a text extract some questions that I got using the SO API.
To make this query-able, I want to开发者_Go百科 use SQLite to store the data.
How should I store the tags column?
Since the limit here on SO is five tags, I can use five columns tag1, tag2 ..., but I think there would be something more elegant that can be done. Something that scales to any number of tags being there, and can also handle basic queries like
select title from table where tag has "c++" and "boost" but not "c"
This is a many to many relationship : questions have multiple tags, tags can appear in multiple questions. This means you have to create three tables, one for the questions, one for the tags, and one for the links between these tables. The resulting query would look like this:
SELECT title FROM question
INNER JOIN question_tag_link USING (question_id)
INNER JOIN tag USING (tag_id)
WHERE tag_name IN('c++', 'boost')
AND NOT EXISTS(
SELECT * FROM tag t1
WHERE t1.tag_name = 'c'
AND t1.question_id = question.question_id);
Not so simple, but I think it is the price to pay if you don't want to be limited. If there are less than 64 different tags, you could use the SET field type, but you would loose very much flexibility (hard to add a new tag).
select distinct a.QuestionTitle
from
(
select q.QuestionID, QuestionTitle, TagName
from QuestionTags as x
join Question as q on q.QuestionID = x.QuestionID
join Tag as t on t.TagID = x.TagID
where TagName in ('c++', 'boost')
) as a
left join
(
select q.QuestionID, QuestionTitle, TagName
from QuestionTags as x
join Question as q on q.QuestionID = x.QuestionID
join Tag as t on t.TagID = x.TagID
where TagName = 'c'
) as b on b.QuestionID = a.QuestionID
where b.QuestionTitle is null
order by a.QuestionTitle ;
精彩评论