开发者

Sql problem with group by and intersection

I have this log table in MySQL with columns ActionName and SourceName.

The same actions can be registered multiple times from different sources.

So an example table might look like

ActionName    SourceName
----------------------------
Add           S01
Add           S02
Add           S02
Edit          S01
Edit       开发者_运维百科   S01
Delete        S01
Delete        S02

Now I would like to query this table and find the actions which have been performed by both S01 and S02. So the results would be:

 ActioName
--------------
Add
Delete

How would I solve this with SQL?


A specific answer...

SELECT
  ActionName
FROM
  yourTable
WHERE
  SourceName in ('S01', 'S02')
GROUP BY
  ActionName
HAVING
  COUNT(DISTINCT SourceName) = 2


Possibly faster for your specific question...

SELECT
  a.SourceName
FROM
  yourTable  AS a
INNER JOIN
  yourTable  AS b
    ON a.ActionName = b.ActionName
WHERE
      a.SourceName = 'S01'
  AND b.SourceName = 'S02'


A general answer...

SELECT
  ActionName
FROM
  yourTable
INNER JOIN
  tableWithSourceNames
    ON yourTable.SourceName = tableWithSourceNames.SourceName
GROUP BY
  ActionName
HAVING
  COUNT(DISTINCT yourTable.SourceName) = (SELECT COUNT(DISTINCT SourceName) FROM tableWithSourceNames)


It turns out that this scales very badly though (as your table increases in size, performance plummets). You can make an optimisation though...

By holding a bit of meta-data about how selective each SourceName is...

CREATE TABLE sourceNameMetaData (
  sourceName  VARCHAR(64),
  occurances  INT
)

I'd recommend keeping this talbe up to date with a trigger or something. You can then filter your ActionTable by the entry that is most restrictive, and then do the rest of the logic as normal.

SELECT
  yourTable.ActionName
FROM
(
  SELECT
    ActionName
  FROM
  (
    SELECT
      sourceName
    FROM
      sourceNameMetaData
    INNER JOIN
      tableWithSourceNames
        ON tableWithSourceNames.SourceName = sourceNameMetaData.SourceName
    ORDER BY
      occurances ASC
    LIMIT
      1
  )
    AS filter    
  INNER JOIN
    yourTable
      ON yourTable.SourceName = filter.SourceName
  GROUP BY
    ActionName
)
  AS filter
INNER JOIN
  yourTable
    ON yourTable.ActionName = filteredData.ActionName
INNER JOIN
  tableWithSourceNames
    ON yourTable.SourceName = tableWithSourceNames.SourceName
GROUP BY
  yourTable.ActionName
HAVING
  COUNT(DISTINCT yourTable.SourceName) = (SELECT COUNT(DISTINCT SourceName) FROM tableWithSourceNames)

Notes:

  • This optimisation isn't needed for small tables
  • This optimisation assumes you have indexes on BOTH (sourceName, ActionName) AND (actionName, sourceName)
  • It's a brilliant example I use to show that more code CAN be faster


SELECT ActionName 
FROM LogTable 
WHERE SourceName IN ('S01', 'S02')
GROUP BY ActionName
HAVING COUNT(DISTINCT SourceName) = 2

or:

SELECT ActionName 
FROM 
    ( SELECT DISTINCT ActionName
      FROM LogTable
    ) AS dn 
WHERE 
    EXISTS
      ( SELECT *
        FROM LogTable AS a
        WHERE a.ActionName = dn.ActionName
          AND a.SourceName = 'S01'
      )
  AND
    EXISTS
      ( SELECT *
        FROM LogTable AS b
        WHERE b.ActionName = dn.ActionName
          AND b.SourceName = 'S02'
      )


Maybe I'm not getting your problem but you wouldn't need to do a group by or much of anything if all you want in the result is what you showed.

select distinct ActionName from YourTable 
where SourceName in ('S01', 'S02')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜