SQL Server: A Grouping question that's annoying me
I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:
Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribu开发者_Python百科te their salaries on a monthly basis.
Month Employee PercentOfTotal
--------------------------------
1 Alice 25%
1 Barbara 65%
1 Claire 10%
2 Alice 25%
2 Barbara 50%
2 Claire 25%
3 Alice 25%
3 Barbara 65%
3 Claire 10%
As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.
What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.
I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)
ID Employee PercentOfTotal
--------------------------------
X Alice 25%
X Barbara 65%
X Claire 10%
Y Alice 25%
Y Barbara 50%
Y Claire 25%
Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.
WITH temp_ids (Month)
AS
(
SELECT DISTINCT MIN(Month)
FROM employees_paid
GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
FROM employees_paid EMP
JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal
Thanks y'all! -Ricky
This gives you an answer in a slightly different format than you requested:
SELECT DISTINCT
T1.PercentOfTotal AS Alice,
T2.PercentOfTotal AS Barbara,
T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
ON T2.Month = T3.Month AND T3.Employee = 'Claire'
Result:
Alice Barbara Claire
25% 50% 25%
25% 65% 10%
If you want to, you can use UNPIVOT to turn this result set into the form you asked for.
SELECT rn AS ID, Employee, PercentOfTotal
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
FROM (
SELECT DISTINCT
T1.PercentOfTotal AS Alice,
T2.PercentOfTotal AS Barbara,
T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
AND T2.Employee = 'Barbara'
JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt
Result:
ID Employee PercentOfTotal
1 Alice 25%
1 Barbara 50%
1 Claire 25%
2 Alice 25%
2 Barbara 65%
2 Claire 10%
What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.
With Employees As
(
Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
Union All Select 1, 'Barbara', .65
Union All Select 1, 'Claire', .10
Union All Select 2, 'Alice', .25
Union All Select 2, 'Barbara', .50
Union All Select 2, 'Claire', .25
Union All Select 3, 'Alice', .25
Union All Select 3, 'Barbara', .65
Union All Select 3, 'Claire', .10
)
, EmployeeRanks As
(
Select Month, Employee, PercentOfTotal
, Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
From Employees
)
, Signatures As
(
Select Month
, Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
From EmployeeRanks
Group By Month
)
, DistinctSignatures As
(
Select Min(Month) As MinMonth, SignatureValue
From Signatures
Group By SignatureValue
)
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
Join DistinctSignatures As D
On D.MinMonth = E.Month
I'm assuming performance won't be great (cause of the subquery)
SELECT * FROM employees_paid where Month not in (
SELECT
a.Month
FROM
employees_paid a
INNER JOIN employees_paid b ON
(a.employee = B.employee AND
a.PercentOfTotal = b.PercentOfTotal AND
a.Month > b.Month)
GROUP BY
a.Month,
b.Month
HAVING
Count(*) = (SELECT COUNT(*) FROM employees_paid c
where c.Month = a.Month)
)
- The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
- We then GROUP by COUNT of matched entries for each month-month combination
- Then the HAVING excludes months that don't have as many matches as there are month entries
- The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)
If I have understood you correctly then, for a general solution, I think you would need to concatenate the whole group together - e.g. to produce Alice:0.25, Barbara:0.50, Claire:0.25
. Then select the distinct groups so something like the following would do it (rather clunkily).
WITH EmpSalaries
AS
(
SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL
SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month, 'Claire' AS Employee, 0.10
),
Months AS
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
(
Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
From EmpSalaries As S1
Where S1.Month = Months.Month
Order By S1.Employee
For Xml Path('')
), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
FROM MonthlySummary
GROUP BY Summary)
I just put together this solution while writing this question, which seems to work
I don't think it does work. Here I've added a further two groups (month = 4 and 5 respectively) which I would consider to be distinct yet the result is the same i.e. month = 1 and 2 only:
WITH employees_paid (Month, Employee, PercentOfTotal)
AS
(
SELECT 1, 'Alice', 0.25
UNION ALL
SELECT 1, 'Barbara', 0.65
UNION ALL
SELECT 1, 'Claire', 0.1
UNION ALL
SELECT 2, 'Alice', 0.25
UNION ALL
SELECT 2, 'Barbara', 0.5
UNION ALL
SELECT 2, 'Claire', 0.25
UNION ALL
SELECT 3, 'Alice', 0.25
UNION ALL
SELECT 3, 'Barbara', 0.65
UNION ALL
SELECT 3, 'Claire', 0.1
UNION ALL
SELECT 4, 'Barbara', 0.25
UNION ALL
SELECT 4, 'Claire', 0.65
UNION ALL
SELECT 4, 'Alice', 0.1
UNION ALL
SELECT 5, 'Diana', 0.25
UNION ALL
SELECT 5, 'Emma', 0.65
UNION ALL
SELECT 5, 'Fiona', 0.1
),
temp_ids (Month)
AS
(
SELECT DISTINCT MIN(Month)
FROM employees_paid
GROUP
BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
FROM employees_paid AS EMP
INNER JOIN temp_ids AS IDS
ON EMP.Month = IDS.Month
GROUP
BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;
精彩评论