开发者

SQL Server select where most columns match

I have a stored procedure that can have 1 to 4 variables passed to it and it must return the rows where the most columns match or if there are no matching records it returns the default ones (which are null). The sequence needs to be distinct.

Example table with data:

Client_Id Project_ID Phase Task Employee Sequence
--------- ---------- ----- ---- -------- --------
NULL      NULL       NULL  NULL Chris    1
NULL      NULL       NULL  NULL开发者_如何转开发 Bob      100
500       NULL       NULL  NULL Joe      1
500       2          NULL  NULL Max      1

So the results for Client 100, any project, phase or task would simply be the default NULL records of Chris and Bob. For Client 500 the results would be Joe and Bob. For Client 500, Project 2 the result would be Max and Bob. Right now I am doing this query by checking the task first then joining it with a query by phase and checking that no rows overlap and doing the same for project then client. It seems incredibly inefficient and there has to be a smarter way about this. Any thoughts?

EDIT - Some query examples, I check first for the case where everything matches

 insert into #TempTracking
    select  p.employee, p.sequence
        from        invoices i, projects p
        where   i.client_id = p.client_id
        and     i.project_no = p.project_no 
        and     i.phase = p.phase 
        and     i.task = p.task

Then I make the queries less and less specific and check that the sequence does not already exist.

  insert    into #TempTracking
select  p.employee, p.sequence
    from        invoices i, projects p
    where   (i.client_id = p.client_id or i.client_id is null)
    and     (i.project_no = p.project_no or i.project_no is null)
    and     (i.phase = p.phase or i.phase is null) 
    and     (i.task = p.task or i.task is null)
    and     NOT EXISTS ( SELECT * FROM #TempTracking t WHERE t.sequence = p.sequence )


"Most of the columns match" is very vague, but I assume you mean that if they search for null, or if the value in the table is null then assume this record could be included.

If you want the most matching row or all rows that match nothing, then you will need to do something like this (it's starting to get very long)

DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '2'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL

SELECT Employee, Sequence 
FROM 
  (SELECT Employee, Sequence, 
  (
    CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
  ) AS MatchCount
WHERE MatchCount = 
  (
    SELECT MAX(
      CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
      CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
    )
    FROM myTable
  )
  -- Now prevent for duplicate sequence numbers
  AND NOT EXISTS (
    SELECT Employee, Sequence 
    FROM 
      (SELECT Employee, Sequence, 
      (
        CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
        CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
      ) AS MatchCount
      FROM myTable) mt2
    WHERE mt2.MatchCount = 
      (
        SELECT MAX(
          CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
          CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
        )
        FROM myTable
      )
      AND mt2.Sequence = myTable.Sequence AND mt2.MatchCount > myTable.MatchCount
  )

Note: This will return all records in the table when the number of matching fields is zero.

I'm sure they're are ways this could be cleaned up to not be so verbose by inserting all matching rows into a temp table and including the number of columns that match (MatchCount), there by reducing the query considerably.

Now, since you want unique Sequences and the highest matching row / rows to be returned the result you're looking for is more like this:

DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '3'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL

INSERT INTO #myTempTable SELECT Employee, Sequence,
  (
    CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
  ) AS MatchCount,
   (
    CASE WHEN (Client_Id IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Project_ID IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Phase IS NULL) THEN 1 ELSE 0 END + 
    CASE WHEN (Task IS NULL) THEN 1 ELSE 0 END
  ) AS NullCount
--   ,(
--    CASE WHEN (Client_Id = @Client_Id OR @Client_Id IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Project_ID = @Project_ID OR @Project_ID IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Phase = @Phase OR @Phase IS NULL) THEN 1 ELSE 0 END + 
--    CASE WHEN (Task = @Task OR @Task IS NULL) THEN 1 ELSE 0 END
--  ) AS MatchCountWithoutNulls

SELECT Employee, Sequence
FROM #myTempTable mtt
WHERE MatchCount = (
    SELECT MAX(MatchCount) 
    FROM #myTempTable mtt2 
    WHERE mtt2.Sequence = mtt.Sequence
  )
  AND NullCount = (
    SELECT MIN(NullCount) 
    FROM #myTempTable mtt2 
    WHERE mtt2.Sequence = mtt.Sequence
  )

Or something very close to that, I don't have a test table made up atm so I can't kick it around and see.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜