开发者

sql selection from one-to-many table

I have got 3 tables with those columns below:

Top开发者_C百科ics:
[TopicID] [TopicName]
Messages:
[MessageID] [MessageText]
MessageTopicRelations
[EntryID] [MessageID] [TopicID]

messages can be about more than one topic. question is: given couple of topics, I need to get messages which are about ALL these topics and not the less, but they can be about some other topic too. a message which is about SOME of these given topics won't be included. I hope I explained my request well. otherwise I can provide sample data. thanks


The following use x, y, and z to stand in for topic ids, being that none were provided for examples.

Using JOINs:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
  JOIN TOPICS tx ON tx.topicid = mtr.topicid
                AND tx.topicid = x
  JOIN TOPICS ty ON ty.topicid = mtr.topicid
                AND ty.topicid = y
  JOIN TOPICS tz ON tz.topicid = mtr.topicid
                AND tz.topicid = z

Using GROUP BY/HAVING COUNT(*):

  SELECT m.*
    FROM MESSAGES m
    JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
    JOIN TOPICS t ON t.topicid = mtr.topicid
   WHERE t.topicid IN (x, y, z)
GROUP BY m.messageid, m.messagetext
  HAVING COUNT(*) = 3

Of the two, the JOIN approach is safer.

The GROUP BY/HAVING relies on the MESSAGETOPICRELATIONS.TOPICID being either part of the primary key, or having a unique key constraint to ensure there aren't duplicates. Otherwise, you could have 2+ instances of the same topic associated to a message - which would be a false positive. Using HAVING COUNT(DISTINCT ... would clear up any false positives, but support depends on the database - MySQL supports it at 5.1+, but not on 4.1. Oracle might, have to wait till Monday to test on SQL Server...

I looked into Bill's comment about not needing the join to the TOPICS table:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
                                AND mtr.topicid IN (x, y, z)

...will return false positives - rows that match at least one of the values defined in the IN clause. And:

SELECT m.*
  FROM MESSAGES m
  JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
                                AND mtr.topicid = x
                                AND mtr.topicid = y
                                AND mtr.topicid = z

...won't return anything at all, because the topicid can never be all of the values at once.


Here's a profoundly inelegant solution

SELECT
     m.MessageID
    ,m.MessageText
FROM
    Messages m
WHERE
    m.MessageID IN (
    SELECT
        mt.MessageID
    FROM
        MessageTopicRelations mt
    WHERE
        TopicID IN (1,4,5)// List of topic IDS
    GROUP BY
        mt.MessageID
    HAVING
        count(*) = 3 //Number of topics
    )


Edit: thanks to @Paul Creasey and @OMG Ponies for finding the flaws in my approach.
The correct way to do this is with a self-join for each topic; as shown in the leading answer.


Another profoundly inelegant entry:

select m.MessageText
       , t.TopicName
  from Messages m
       inner join MessageTopicRelations mtr
       on mtr.MessageID = m.MessageID
       inner join Topics t
       on t.TopicID = mtr.TopicID
   and
       t.TopicName = 'topic1'

UNION 

select m.MessageText
       , t.TopicName
  from Messages m
       inner join MessageTopicRelations mtr
       on mtr.MessageID = m.MessageID
       inner join Topics t
       on t.TopicID = mtr.TopicID
   and
       t.TopicName = 'topic2'
...


Re: the answer by OMG Ponies, you don't need to join to the TOPICS table. And the HAVING COUNT(DISTINCT) clause works fine in MySQL 5.1. I just tested it.

This is what I mean:

Using GROUP BY/HAVING COUNT(*):

  SELECT m.*
    FROM MESSAGES m
    JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
   WHERE mtr.topicid IN (x, y, z)
GROUP BY m.messageid
  HAVING COUNT(DISTINCT mtr.topicid) = 3

The reason that I suggest COUNT(DISTINCT) is that if the columns (messageid,topicid) don't have a unique constraint, you could get duplicates, which would result in a count of 3 in the group, even with fewer than three distinct values.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜