开发者

Finding unique matches in 2 separate databases

I have 2 databases that have the same structure, but different data. Both are SQL 2005.

I am trying to find which of the Persons in Database A, exist in Database B. My best opportunity for match is to match on FirstName and LastName.

I only want to bring back a list of:

DatabaseA.Person DatabaseB.Person

Where: 1. I want all records from DatabaseA, even if there is not a match in Database B. 2. I on开发者_如何学编程ly want records from DatabaseB where the FirstName/LastName match only one record in DatabaseB.

I have written a query, where I group by, but since I need to see more data than FirstName and LastName, I cannot bring it back without grouping it - which gives me many duplicates. What kind of query should I be using? Do I need to use a cursor?

Here is my query now, which sort of works - except I'm getting results for duplicates in DatabaseB and all I want to know about Database B is when FirstName/LastName matches to one distinct record and no others. My objective is to get a list of people that I know are the same person in 2 databases so that I can build a dictionary list of department code mappings between employees.

    select 
count(DatabaseAEmployee.id) as matchcount
, DatabaseAPerson.id as DatabaseAPersonid
, DatabaseAEmployee.DeptCode DatabaseADeptCode
, DatabaseAPerson.firstname as DatabaseAfirst
, DatabaseAPerson.lastname as DatabaseAlast
, DatabaseBPerson.id as DatabaseBPersonid
, DatabaseBEmployee.DeptCode as DatabaseBDeptCode
, DatabaseBPerson.firstname as DatabaseBfirst
, DatabaseBPerson.lastname as DatabaseBlast
, DatabaseAPerson.ssn as DatabaseAssn
, DatabaseBPerson.ssn as DatabaseBssn
, DatabaseAPerson.dateofbirth as DatabaseAdob
, DatabaseBPerson.dateofbirth as DatabaseBdob

FROM [DatabaseA].[dbo].Employee DatabaseAEmployee
LEFT OUTER JOIN [DatabaseA].[dbo].Person DatabaseAPerson 
 ON DatabaseAPerson.id = DatabaseAEmployee.id
LEFT OUTER JOIN [DatabaseB].[dbo].Person DatabaseBPerson
 ON 
 DatabaseAPerson.firstname = DatabaseBPerson.firstname 
 AND
 DatabaseAPerson.lastname = DatabaseBPerson.lastname 
LEFT OUTER JOIN [DatabaseB].[dbo].Employee DatabaseBEmployee 
 on DatabaseBEmployee.id = DatabaseBPerson.id
group by 
DatabaseAPerson.firstname
, DatabaseAPerson.lastname
, DatabaseAPerson.id
, DatabaseAEmployee.DeptCode
, DatabaseBPerson.id
, DatabaseBEmployee.DeptCode
, DatabaseBPerson.firstname
, DatabaseBPerson.lastname
, DatabaseBPerson.ssn
, DatabaseAPerson.ssn
, DatabaseBPerson.dateofbirth
, DatabaseAPerson.dateofbirth

Here's what I'm trying now, but I'm getting duplicates on the left side:

with UniqueMatchedPersons (Id, FirstName, LastName)
as (
select 
    p2.ID, p2.FirstName, p2.LastName
from 
    [DatabaseA].[dbo].[Employee] p1 
INNER JOIN [DatabaseA].[dbo].[Person] p2 on p1.id = p2.id
    inner join [DatabaseB].[dbo].[Person] p3
        on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
INNER JOIN  [DatabaseB].[dbo].[Employee] p4
on p3.id = p4.id

group by p2.ID, p2.FirstName, p2.LastName
having count(p2.ID) = 1

)

select p1.*, p2.*
from DatabaseA.dbo.Person p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join DatabaseB.dbo.Person p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName


Try this:

SELECT id,FirstName,Lastname 
FROM   dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName 
FROM   dbb.Persons as b
INNER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName

If you want to get all from A and only those from B that DON'T have a match (which would make more sense to me) i'd use this:

SELECT id,FirstName,Lastname 
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName 
FROM dbb.Persons as b
LEFT OUTER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
WHERE a.id is null


Try something like:

Select dta.LastName, dta.FirstName, dta.[otherColumns] dtb.LastName, dtb.FirstName
    dtb.[otherColumns]
From [databaseA].[table] as dta
LEFT OUTER JOIN [databaseB].[table] as dtb
    on dta.Lastname = dtb.LastName and dta.FirstName = dtb.FirstName

That should get you: 1) everyone in table A, and 2) everyone in table B who is has a Lastname/Firstname match in table A.


Works when SQL Server (at least it should)

SELECT
    A.*
    ,   B.*
FROM
    DatabaseA.dbo.Person A
    LEFT JOIN DatabaseB.dbo.Person B 
        ON A.FirstName = B.FirstName AND A.LastName = B.LastName

Edit: You mention you receive duplicates from DatabaseB where you only need the match on first and lastname. But you also request other data (then first/lastname) this is the problem. If you distinct data they you only request that data.


Using transact-sql, the following untested query should allow you to view unique matches only:

select 
    p1.ID, p1.FirstName, p1.LastName 
from 
    [DatabaseA].[dbo].[Persons] p1 
    left outer join [DatabaseB].[dbo].[Persons] p2 
        on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

group by p1.ID, p1.FirstName, p2.LastName

having count(p1.ID) = 1

If using Sql Server, this can then be encapsulated within a common table expression, to which you can perform a join.

with UniqueMatchedPersons (Id, FirstName, LastName)
as (
    --query in previous code snippet 
)
select persons.*
from Persons
inner join UniqueMatchedPersons on Persons.ID = UniqueMatchedPersons.ID

Update:

If you wish to select fields from both tables, you can simply respecify the original join condition that evaluated name matching before; this is because duplicated matches on the left hand side of the join have been filtered out by the having aggregate condition.

Modifying the select portion of the above snippet to read the following will allow you to select fields from either side of the join:

select p1.*, p2.*
from [DatabaseA].[dbo].[Persons] p1 
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join [DatabaseB].[dbo].[Persons] p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

Update 2:

To filter out duplicates on the left hand side (which will also cause duplicates on the right) you'll have to remove the grouping on [DatabaseA].[dbo].[Persons].[ID].

When I refer to duplicates, I mean names in adjacent rows that are identical in terms of characters and padding. If you have diacritic variations of first and last names, then the results of the name comparison will be subject to the database collation (unless you explicity declare a collation on a join expression). Likewise if you have variations in spacing, padding or punctuation between names, you may have to consider a different approach than a direct equality operator for name matching.

Try the following:

with UniqueMatchedPersons (FirstName, LastName)
as (
select 
    p1.FirstName, p1.LastName
from 
  [DatabaseA].[dbo].[Person] p1
  left outer join [DatabaseB].[dbo].[Person] p2
        on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName

group by p1.FirstName, p1.LastName
having count(p1.FirstName) = 1
)

select p1.*, p2.*, e1.*, e2.*
from [DatabaseA].[dbo].[Person] p1
inner join UniqueMatchedPersons ump 
      on p1.FirstName = ump.FirstName and p1.LastName = ump.LastName
left outer join [DatabaseB].[dbo].[Person] p2 
      on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
inner join [DatabaseA].[dbo].[Employee] e1 on p1.ID = e1.ID
inner join [DatabaseB].[dbo].[Employee] e2 on e2.ID = p2.ID

order by p1.id asc
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜