Comparing two large SQL queries
I am replacing an old untidy SQL query with a new version, because it was failing to include several rows it should have.
The new query certainl开发者_JAVA百科y includes these missing rows, but I want to be entirely sure that it also includes all of the rows from the original query.
The two queries use entirely different tables. Each query is ~14000 rows.
Is there a query I can write that checks whether QueryA contains any rows that QueryB does not have?
You could do something like this.
Select * FROM
(
QUERY A GOES HERE
) AS A
LEFT JOIN
(
QUERY B GOES HERE
) AS B
ON A.Col1=B.Col1 AND A.Col2=B.Col2 ....
WHERE B.Col1 IS NULL
You can either include all the columns in the "on clause" or you can just include the columns you need to ensure the rows are the same, such as the primary key.
Assuming that both queries returns the primary key column from the same table:
select *
from (QueryA) a
where a.PK not in (select PK from (QueryB) b)
Please note that the parantheses means that these are subqueries.
Could you use a LEFT OUTER JOIN
?
SELECT
*
FROM
( < put query 1 here > ) AS Query1
LEFT JOIN
( < put query 2 here > ) AS Query2
ON Query1.Field1 = Query2.Field1
AND Query1.Field2 = Query2.Field2
AND Query1.Field3 = Query2.Field3
etc, etc
WHERE
Query2.Field1 IS NULL
- Select the results from the old query into a temp table.
- Select the results from the new query into another temp table.
- Outer join from one table to the other, equating all the columns.
- Add a where clause to only return rows where there are nulls on one side or the other.
For example, if the results looked like (theoretically):
UserId | FirstName | LastName | Email
Then run each query into a separate temp table with the same structure, say #resOld
and '#resNew'.
Then:
SELECT
*
FROM #resOld OLD
LEFT OUTER JOIN #redNew NEW -- LEFT OUTER JOIN, so we still retain rows which don't match
ON NEW.UserId = OLD.UserId
AND NEW.FirstName = OLD.FirstName
AND NEW.LastName = OLD.LastName
AND NEW.Email = OLD.Email
WHERE OLD.UserID IS NULL -- Only retain rows where we didn't match. Can use any field from OLD which cannot be null by design.
This query will only return rows when there are rows from one results set which don't match rows in the other.
EDIT: That is more complicated than it needs to be, you can just invert the join criteria and remove the WHERE like so:
SELECT
*
FROM #resOld OLD
INNER JOIN #redNew NEW -- Inner join where rows are different.
ON NEW.UserId != OLD.UserId
AND NEW.FirstName != OLD.FirstName
AND NEW.LastName != OLD.LastName
AND NEW.Email != OLD.Email
If you're able to get it into SQL Server, you can just do:
<Query 1>
EXCEPT
<Query 2>
<Query 2>
EXCEPT
<Query 1>
This will only output records that don't exist in the other query. It checks all fields in the result sets.
I included both directions since if there are more records in the lower part of the EXCEPT
statement those don't get shown as exceptions.
The easy way is to use a Union of QueryA and QueryB and Group By.
See an example of that here: http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx on Jeff's SQL server blog.
精彩评论