Match only if one field matches multiple values (err?)
I've got a mental block. I'm pretty certain this is a dead simple noob question to solve, but I'm drawing a blank:
I have a tagging system for articles. This is done by having a separate table that contains an article ID as well as a tag ID, so multiple tags can get assigned to one article and vice-versa. It all works well. But now what I want to do is to filter articles based on whether they match two or more tags, or match two or more tags but not specific ones, or match all the tags specified, and so on.
--------------------
|ID|ArticleID|TagID|
--------------------
|1 |4000 |123 |
|2 |4000 |3532 |
|3 |4000 |4386 |
|4 |4001 |3532 |
etc...
--------------------
So it should return:
- 4000 and 4001 if I only search for 3532
- 4000 if I say I want only stuff that matches 123 and 4386
- 4000 and 4001 if I want it to match 123 or 3532
- 4001 if I want it to match 3532 but not 123.
My ideas so far have involved going "select articleid where tagid = 123 and tagid = 4386" but obviously it's impossible for the tagid field to be two things on the same record (using "or" would get me the results, but it wouldn't ensure that it was only things that matched both 123 and 4386). Next up I was going to query them one condition at a time and then use PHP to filter through which articles should/shouldn't match, but there's the nagging feeling at the back of my mind that this should be able to do开发者_开发技巧 be done easily at the database level, I just can't think of how (or what to Google for). I'm hoping to be able to filter upwards of 1000 tags at a time.
You can use multiple inner joins for the AND case:
select distinct a.ArticleID
from Articles as a
inner join Articles AS b on a.ArticleID = b.ArticleID and b.TagID = 123
inner join Articles AS c on a.ArticleID = c.ArticleID and c.TagID = 4386;
For the OR case, you can simply do:
select distinct ArticleID
from Articles
where TagID = 123 or TagID = 3532;
To do all of the cases together, you are probably going to end up using subqueries:
select distinct a.ArticleID
from Articles as a
where exists (select * from Articles as b where a.ArticleID = b.ArticleID and b.TagID = 123);
You can then use the general SQL logic operators (and, or, not) to join multiple exists
conditions together. This quite probably will not be particularly efficient for enormous numbers of tags.
You have three query options:
SELECT a.articleid
FROM ARTICLES a
JOIN TAGS t ON t.tagid = a.tagid
WHERE t.tagid IN (123, 4386)
GROUP BY a.articleid
HAVING COUNT(DISTINCT t.tagid) = 2
Mind that the count has to equal the number of parameters in the IN
clause, and the use of DISTINCT. Without the distinct, an article had 2 associations to the same tag would turn up as a false positive. This approach also doesn't transfer as easily IMO to dynamic SQL...
SELECT a.articleid
FROM ARTICLES a
JOIN TAGS t1 ON t1.tagid = a.tagid
AND t1.tagid = 123
JOIN TAGS t2 ON t2.tagid = a.tagid
AND t2.tagid = 4386
GROUP BY a.articleid
This is likely to be the fastest of the options.
SELECT a.articleid
FROM ARTICLES a
WHERE EXISTS(SELECT NULL
FROM TAGS t
WHERE t.tagid = a.tagid
AND t.tagid = 123)
AND EXISTS(SELECT NULL
FROM TAGS t
WHERE t.tagid = a.tagid
AND t.tagid = 4386)
Because your requirements are dynamic, I suggest using MySQL's Prepared Statements:
DECLARE num INT
SET @sql = 'SELECT a.articleid FROM ARTICLES a';
WHILE num > 0
SET @sql = CONCAT(@sql, 'JOIN TAGS t', num, 'ON t', num,'.tagid = a.tagid AND t', num,'.tagid = ', tag, ' ');
SET num = num - 1;
END WHILE;
SET @sql = CONCAT(@sql, 'GROUP BY a.articleid');
PREPARE stmt FROM @sql
EXECUTE stmt
DEALLOCATE PREPARE stmt;
If you set num
to zero, you'll get all articles. The WHILE
loop will append JOINs, creating an inclusive list of tags. It's enough to get you started if you want to support exclusions in the same query.
for the second condition you could use:
select t1.articleid from table t1, table t2 where t1.tagid = 123 and t2.tagid = 4386
In addition to the other answers here, if you can build queries that work with subqueries, you can work around SQL's typical inability to deal with arrays by passing a comma-delimited string of tag IDs to a table-valued function. The function will break the string out to a table of ints, that you can query on. Using this, you'll be able to query on (max-varchar-size / max-size-of-ID-as-string) tags; typically thousands.
Pardon the MSSQL syntax; I don't know mysql. I hope it has table-valued functions (a function that returns a table) or equivalent.
-- the 'OR' query
declare @taglist varchar(8000)
set @taglist = '1,2,3,4'
SELECT DISTINCT a.ArticleID FROM Article a
JOIN Tags t ON t.ArticleID = a.ArticleID
WHERE t.TagID IN (SELECT * FROM arrToTable(tagList))
Here's a sample of such a function.
SELECT DISTINCT ARTICLEID FROM myTable WHERE TagId=3532
SELECT DISTINCT ARTICLEID FROM myTable WHERE TagId=123 AND TagId=4386
SELECT DISTINCT ARTICLEID FROM myTable WHERE TagId=123 OR TagId=3532
SELECT DISTINCT ARTICLEID FROM myTable WHERE TagId=3532 AND TagId <> 123
精彩评论