SQL Server: A Grouping question that's annoying me

2023-01-03 06:50 问答作者：

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribu开发者_Python百科te their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all! -Ricky

This gives you an answer in a slightly different format than you requested:

SELECT DISTINCT
    T1.PercentOfTotal AS Alice,
    T2.PercentOfTotal AS Barbara,
    T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
  ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
  ON T2.Month = T3.Month AND T3.Employee = 'Claire'

Result:

Alice   Barbara  Claire
25%     50%      25%
25%     65%      10%

If you want to, you can use UNPIVOT to turn this result set into the form you asked for.

SELECT rn AS ID, Employee, PercentOfTotal
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
    FROM (
        SELECT DISTINCT
            T1.PercentOfTotal AS Alice,
            T2.PercentOfTotal AS Barbara,
            T3.PercentOfTotal AS Claire
        FROM employees_paid T1
        JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
                                                      AND T2.Employee = 'Barbara'
        JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
    ) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt

Result:

ID  Employee  PercentOfTotal  
1   Alice     25%
1   Barbara   50%      
1   Claire    25%             
2   Alice     25%             
2   Barbara   65%              
2   Claire    10%

What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.

With Employees As
    (
    Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
    Union All Select 1, 'Barbara', .65
    Union All Select 1, 'Claire', .10
    Union All Select 2, 'Alice', .25
    Union All Select 2, 'Barbara', .50
    Union All Select 2, 'Claire', .25
    Union All Select 3, 'Alice', .25
    Union All Select 3, 'Barbara', .65
    Union All Select 3, 'Claire', .10
    )
    , EmployeeRanks As
    (
    Select Month, Employee, PercentOfTotal
        , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
    From Employees
    )
    , Signatures As
    (
    Select Month
        , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
    From EmployeeRanks
    Group By Month
    )
    , DistinctSignatures As
    (
    Select Min(Month) As MinMonth, SignatureValue
    From Signatures
    Group By SignatureValue
    )
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
    Join DistinctSignatures As D
        On D.MinMonth = E.Month

I'm assuming performance won't be great (cause of the subquery)

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON 
               (a.employee = B.employee AND 
               a.PercentOfTotal = b.PercentOfTotal AND 
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c 
               where c.Month = a.Month)
     )

The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
We then GROUP by COUNT of matched entries for each month-month combination
Then the HAVING excludes months that don't have as many matches as there are month entries
The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)

If I have understood you correctly then, for a general solution, I think you would need to concatenate the whole group together - e.g. to produce Alice:0.25, Barbara:0.50, Claire:0.25. Then select the distinct groups so something like the following would do it (rather clunkily).

WITH EmpSalaries
AS
(

SELECT 1 AS Month, 'Alice' AS Employee, 0.25 AS PercentOfTotal UNION ALL
SELECT 1 AS Month, 'Barbara' AS Employee, 0.65 UNION ALL
SELECT 1 AS Month, 'Claire' AS Employee, 0.10 UNION ALL

SELECT 2 AS Month, 'Alice' AS Employee, 0.25 UNION ALL
SELECT 2 AS Month, 'Barbara' AS Employee, 0.50 UNION ALL
SELECT 2 AS Month, 'Claire' AS Employee, 0.25 UNION ALL

SELECT 3 AS Month,  'Alice' AS Employee, 0.25 UNION ALL
SELECT 3 AS Month,  'Barbara' AS Employee, 0.65 UNION ALL
SELECT 3 AS Month,  'Claire' AS Employee, 0.10 
),
Months AS 
(
SELECT DISTINCT Month FROM EmpSalaries
),
MonthlySummary AS
(
SELECT Month,
Stuff(
            (
            Select ', ' + S1.Employee + ':' + cast(PercentOfTotal as varchar(20))
            From EmpSalaries As S1
            Where S1.Month = Months.Month
            Order By S1.Employee
            For Xml Path('')
            ), 1, 2, '') As Summary
FROM Months
)
SELECT * FROM EmpSalaries
WHERE Month IN (SELECT MIN(Month)
                FROM MonthlySummary
                GROUP BY Summary)

I just put together this solution while writing this question, which seems to work

I don't think it does work. Here I've added a further two groups (month = 4 and 5 respectively) which I would consider to be distinct yet the result is the same i.e. month = 1 and 2 only:

WITH employees_paid (Month, Employee, PercentOfTotal)
AS 
(
 SELECT 1, 'Alice', 0.25
 UNION ALL
 SELECT 1, 'Barbara', 0.65
 UNION ALL
 SELECT 1, 'Claire', 0.1
 UNION ALL
 SELECT 2, 'Alice', 0.25
 UNION ALL
 SELECT 2, 'Barbara', 0.5
 UNION ALL
 SELECT 2, 'Claire', 0.25
 UNION ALL
 SELECT 3, 'Alice', 0.25
 UNION ALL
 SELECT 3, 'Barbara', 0.65
 UNION ALL
 SELECT 3, 'Claire', 0.1
 UNION ALL
 SELECT 4, 'Barbara', 0.25
 UNION ALL
 SELECT 4, 'Claire', 0.65
 UNION ALL
 SELECT 4, 'Alice', 0.1
 UNION ALL
 SELECT 5, 'Diana', 0.25
 UNION ALL
 SELECT 5, 'Emma', 0.65
 UNION ALL
 SELECT 5, 'Fiona', 0.1
), 
temp_ids (Month)
AS
(
 SELECT DISTINCT MIN(Month)
   FROM employees_paid
  GROUP 
     BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid AS EMP
       INNER JOIN temp_ids AS IDS 
          ON EMP.Month = IDS.Month
 GROUP 
    BY EMP.Month, EMP.Employee, EMP.PercentOfTotal;

继续阅读：database-partitioning sql sql-server

SQL Server: A Grouping question that's annoying me

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？