开发者

Comparing two large SQL queries

I am replacing an old untidy SQL query with a new version, because it was failing to include several rows it should have.

The new query certainl开发者_JAVA百科y includes these missing rows, but I want to be entirely sure that it also includes all of the rows from the original query.

The two queries use entirely different tables. Each query is ~14000 rows.

Is there a query I can write that checks whether QueryA contains any rows that QueryB does not have?


You could do something like this.

Select  * FROM 
(
QUERY A GOES HERE 
) AS A
LEFT JOIN
(
QUERY B GOES HERE 
) AS B
ON A.Col1=B.Col1 AND A.Col2=B.Col2 ....
WHERE B.Col1 IS NULL

You can either include all the columns in the "on clause" or you can just include the columns you need to ensure the rows are the same, such as the primary key.


Assuming that both queries returns the primary key column from the same table:

select *
from (QueryA) a
where a.PK not in (select PK from (QueryB) b)

Please note that the parantheses means that these are subqueries.


Could you use a LEFT OUTER JOIN?

SELECT
  *
FROM
  ( < put query 1 here > ) AS Query1
LEFT JOIN
  ( < put query 2 here > ) AS Query2
    ON  Query1.Field1 = Query2.Field1
    AND Query1.Field2 = Query2.Field2
    AND Query1.Field3 = Query2.Field3
    etc, etc
WHERE
  Query2.Field1 IS NULL


  1. Select the results from the old query into a temp table.
  2. Select the results from the new query into another temp table.
  3. Outer join from one table to the other, equating all the columns.
  4. Add a where clause to only return rows where there are nulls on one side or the other.

For example, if the results looked like (theoretically):

UserId | FirstName | LastName | Email

Then run each query into a separate temp table with the same structure, say #resOld and '#resNew'.

Then:

SELECT
    *

FROM #resOld OLD

LEFT OUTER JOIN #redNew NEW -- LEFT OUTER JOIN, so we still retain rows which don't match
    ON  NEW.UserId    = OLD.UserId
    AND NEW.FirstName = OLD.FirstName
    AND NEW.LastName  = OLD.LastName
    AND NEW.Email     = OLD.Email

WHERE OLD.UserID IS NULL -- Only retain rows where we didn't match. Can use any field from OLD which cannot be null by design.

This query will only return rows when there are rows from one results set which don't match rows in the other.

EDIT: That is more complicated than it needs to be, you can just invert the join criteria and remove the WHERE like so:

SELECT
    *

FROM #resOld OLD

INNER JOIN #redNew NEW -- Inner join where rows are different.
    ON  NEW.UserId    != OLD.UserId
    AND NEW.FirstName != OLD.FirstName
    AND NEW.LastName  != OLD.LastName
    AND NEW.Email     != OLD.Email


If you're able to get it into SQL Server, you can just do:

<Query 1>
EXCEPT
<Query 2>

<Query 2>
EXCEPT
<Query 1>

This will only output records that don't exist in the other query. It checks all fields in the result sets.

I included both directions since if there are more records in the lower part of the EXCEPT statement those don't get shown as exceptions.


The easy way is to use a Union of QueryA and QueryB and Group By.

See an example of that here: http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx on Jeff's SQL server blog.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜