Sql problem with group by and intersection
I have this log table in MySQL with columns ActionName and SourceName.
The same actions can be registered multiple times from different sources.
So an example table might look like
ActionName SourceName
----------------------------
Add S01
Add S02
Add S02
Edit S01
Edit 开发者_运维百科 S01
Delete S01
Delete S02
Now I would like to query this table and find the actions which have been performed by both S01 and S02. So the results would be:
ActioName
--------------
Add
Delete
How would I solve this with SQL?
A specific answer...
SELECT
ActionName
FROM
yourTable
WHERE
SourceName in ('S01', 'S02')
GROUP BY
ActionName
HAVING
COUNT(DISTINCT SourceName) = 2
Possibly faster for your specific question...
SELECT
a.SourceName
FROM
yourTable AS a
INNER JOIN
yourTable AS b
ON a.ActionName = b.ActionName
WHERE
a.SourceName = 'S01'
AND b.SourceName = 'S02'
A general answer...
SELECT
ActionName
FROM
yourTable
INNER JOIN
tableWithSourceNames
ON yourTable.SourceName = tableWithSourceNames.SourceName
GROUP BY
ActionName
HAVING
COUNT(DISTINCT yourTable.SourceName) = (SELECT COUNT(DISTINCT SourceName) FROM tableWithSourceNames)
It turns out that this scales very badly though (as your table increases in size, performance plummets). You can make an optimisation though...
By holding a bit of meta-data about how selective each SourceName is...
CREATE TABLE sourceNameMetaData (
sourceName VARCHAR(64),
occurances INT
)
I'd recommend keeping this talbe up to date with a trigger or something. You can then filter your ActionTable by the entry that is most restrictive, and then do the rest of the logic as normal.
SELECT
yourTable.ActionName
FROM
(
SELECT
ActionName
FROM
(
SELECT
sourceName
FROM
sourceNameMetaData
INNER JOIN
tableWithSourceNames
ON tableWithSourceNames.SourceName = sourceNameMetaData.SourceName
ORDER BY
occurances ASC
LIMIT
1
)
AS filter
INNER JOIN
yourTable
ON yourTable.SourceName = filter.SourceName
GROUP BY
ActionName
)
AS filter
INNER JOIN
yourTable
ON yourTable.ActionName = filteredData.ActionName
INNER JOIN
tableWithSourceNames
ON yourTable.SourceName = tableWithSourceNames.SourceName
GROUP BY
yourTable.ActionName
HAVING
COUNT(DISTINCT yourTable.SourceName) = (SELECT COUNT(DISTINCT SourceName) FROM tableWithSourceNames)
Notes:
- This optimisation isn't needed for small tables
- This optimisation assumes you have indexes on BOTH (sourceName, ActionName) AND (actionName, sourceName)
- It's a brilliant example I use to show that more code CAN be faster
SELECT ActionName
FROM LogTable
WHERE SourceName IN ('S01', 'S02')
GROUP BY ActionName
HAVING COUNT(DISTINCT SourceName) = 2
or:
SELECT ActionName
FROM
( SELECT DISTINCT ActionName
FROM LogTable
) AS dn
WHERE
EXISTS
( SELECT *
FROM LogTable AS a
WHERE a.ActionName = dn.ActionName
AND a.SourceName = 'S01'
)
AND
EXISTS
( SELECT *
FROM LogTable AS b
WHERE b.ActionName = dn.ActionName
AND b.SourceName = 'S02'
)
Maybe I'm not getting your problem but you wouldn't need to do a group by or much of anything if all you want in the result is what you showed.
select distinct ActionName from YourTable
where SourceName in ('S01', 'S02')
精彩评论