SQL Server select where most columns match
I have a stored procedure that can have 1 to 4 variables passed to it and it must return the rows where the most columns match or if there are no matching records it returns the default ones (which are null). The sequence needs to be distinct.
Example table with data:
Client_Id Project_ID Phase Task Employee Sequence
--------- ---------- ----- ---- -------- --------
NULL NULL NULL NULL Chris 1
NULL NULL NULL NULL开发者_如何转开发 Bob 100
500 NULL NULL NULL Joe 1
500 2 NULL NULL Max 1
So the results for Client 100, any project, phase or task would simply be the default NULL records of Chris and Bob. For Client 500 the results would be Joe and Bob. For Client 500, Project 2 the result would be Max and Bob. Right now I am doing this query by checking the task first then joining it with a query by phase and checking that no rows overlap and doing the same for project then client. It seems incredibly inefficient and there has to be a smarter way about this. Any thoughts?
EDIT - Some query examples, I check first for the case where everything matches
insert into #TempTracking
select p.employee, p.sequence
from invoices i, projects p
where i.client_id = p.client_id
and i.project_no = p.project_no
and i.phase = p.phase
and i.task = p.task
Then I make the queries less and less specific and check that the sequence does not already exist.
insert into #TempTracking
select p.employee, p.sequence
from invoices i, projects p
where (i.client_id = p.client_id or i.client_id is null)
and (i.project_no = p.project_no or i.project_no is null)
and (i.phase = p.phase or i.phase is null)
and (i.task = p.task or i.task is null)
and NOT EXISTS ( SELECT * FROM #TempTracking t WHERE t.sequence = p.sequence )
"Most of the columns match" is very vague, but I assume you mean that if they search for null, or if the value in the table is null then assume this record could be included.
If you want the most matching row or all rows that match nothing, then you will need to do something like this (it's starting to get very long)
DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '2'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL
SELECT Employee, Sequence
FROM
(SELECT Employee, Sequence,
(
CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
) AS MatchCount
WHERE MatchCount =
(
SELECT MAX(
CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
)
FROM myTable
)
-- Now prevent for duplicate sequence numbers
AND NOT EXISTS (
SELECT Employee, Sequence
FROM
(SELECT Employee, Sequence,
(
CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
) AS MatchCount
FROM myTable) mt2
WHERE mt2.MatchCount =
(
SELECT MAX(
CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
)
FROM myTable
)
AND mt2.Sequence = myTable.Sequence AND mt2.MatchCount > myTable.MatchCount
)
Note: This will return all records in the table when the number of matching fields is zero.
I'm sure they're are ways this could be cleaned up to not be so verbose by inserting all matching rows into a temp table and including the number of columns that match (MatchCount
), there by reducing the query considerably.
Now, since you want unique Sequences and the highest matching row / rows to be returned the result you're looking for is more like this:
DECLARE @Client_Id VARCHAR(MAX) = '500'
DECLARE @Project_ID VARCHAR(MAX) = '3'
DECLARE @Phase VARCHAR(MAX) = NULL
DECLARE @Task VARCHAR(MAX) = NULL
INSERT INTO #myTempTable SELECT Employee, Sequence,
(
CASE WHEN (Client_Id = @Client_Id OR Client_Id IS NULL OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID = @Project_ID OR Project_ID IS NULL OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase = @Phase OR Phase IS NULL OR @Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task = @Task OR Task IS NULL OR @Task IS NULL) THEN 1 ELSE 0 END
) AS MatchCount,
(
CASE WHEN (Client_Id IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Project_ID IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Phase IS NULL) THEN 1 ELSE 0 END +
CASE WHEN (Task IS NULL) THEN 1 ELSE 0 END
) AS NullCount
-- ,(
-- CASE WHEN (Client_Id = @Client_Id OR @Client_Id IS NULL) THEN 1 ELSE 0 END +
-- CASE WHEN (Project_ID = @Project_ID OR @Project_ID IS NULL) THEN 1 ELSE 0 END +
-- CASE WHEN (Phase = @Phase OR @Phase IS NULL) THEN 1 ELSE 0 END +
-- CASE WHEN (Task = @Task OR @Task IS NULL) THEN 1 ELSE 0 END
-- ) AS MatchCountWithoutNulls
SELECT Employee, Sequence
FROM #myTempTable mtt
WHERE MatchCount = (
SELECT MAX(MatchCount)
FROM #myTempTable mtt2
WHERE mtt2.Sequence = mtt.Sequence
)
AND NullCount = (
SELECT MIN(NullCount)
FROM #myTempTable mtt2
WHERE mtt2.Sequence = mtt.Sequence
)
Or something very close to that, I don't have a test table made up atm so I can't kick it around and see.
精彩评论