SQL Query question - How to select groups of rows using commonalities
I'm working census data, and I wish to provide the ability to search for records by providing a list of names. The idea is that if you know the name or 2 or 3 family members, you will be able to exclude all addresses which don't have those names. Consider this example dataset (Address, Family Name, Forename):
"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Julia"
"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Richard" "Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Hannah" "Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part 开发者_开发百科of), Cork)", "Gamble", "Helen"A search for Julia, Hannah and Helen should be able to return all 4 rows as they share a common address. It sounds simple enough, but I'm having trouble with this one. Cursors are out because of the size of the dataset. Any ideas?
(Needless to say, I've simplified this somewhat as I'm ignoring the Family-name part of the search for now)
This query:
select streetaddress, count(*) as occupantcount
from census
where firstname in ("Julia", "Hannah", "Helen")
group by streetaddress
order by occupantcount desc
would return the address and number of occupants at each address wherever one of the occupants has one of the first names in the IN() list, and order the results in most-occupants-to-least-occupants order. Variations on this same principle (grouping by address) can get you other sorts of information. You can apply successive filters to the results to zero-in on your desired rows.
This is a relational division problem.
SELECT Address, FamilyName, Forename
FROM YourTable
WHERE Address IN (SELECT Address
FROM YourTable
WHERE Forename IN ( 'Julia', 'Hannah', 'Helen' )
GROUP BY Address
HAVING COUNT(DISTINCT Forename) = 3)
Or
WITH Names(name)
AS (SELECT 'Julia'
UNION ALL
SELECT 'Hannah'
UNION ALL
SELECT 'Helen')
SELECT Address,
FamilyName,
Forename
FROM YourTable y1
WHERE NOT EXISTS (SELECT *
FROM Names n
WHERE NOT EXISTS(SELECT *
FROM YourTable y2
WHERE y1.Address = y2.Address
AND y2.Forename = n.Name))
You can try this:
SELECT A.*
FROM YourTable A
JOIN ( SELECT Address, COUNT(*) Quant
FROM YourTable
WHERE Forename IN ('Julia','Hannah','Helen')
GROUP BY Address
HAVING COUNT(DISTINCT Forename) > 2) B
ON A.Address = B.Address
If you index forename and address this should be straightforward and fast:
SELECT a.Address as CommonAddress
FROM (SELECT Address FROM Names WHERE Forename = 'Julia') a
INNER JOIN (SELECT Address FROM Names WHERE Forename = 'Richard') b ON a.Address=b.Address
INNER JOIN (SELECT Address FROM Names WHERE Forename = 'Helen') c on b.Address=c.Address
A solution is to use a subquery to find the address that is common to all of the supplied names, and to return all records at that address.
declare @people table (
address varchar(255),
familyName varchar(255),
forename varchar(255)
)
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Julia')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Richard')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Hannah')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Helen')
insert into @people
values ('Residents of a house 2 somewhere else (Cork No. 4 Urban (part of), Cork)', 'Cooper', 'Helen')
select people.*
from @people as people
where people.address in (
select address
from @people
where forename in ('Julia', 'Hannah', 'Helen')
group by address
having count(forename) >= 3 -- This must be equal to the number of names searched for
)
One issue with this solution is that, if there are three Hannahs living at the same address, the query will return those people, even if Julia and Helen are not living there.
SELECT t.address
, t.familyname
, t.forename
FROM yourTable t
WHERE t.address IN
-- search subquery
( SELECT s1.address
FROM yourTable s1
JOIN yourTable s2
ON s2.address = s1.address
JOIN yourTable s3
ON s3.address = s1.address
WHERE s1.forename = "Julia"
AND s2.forename = "Hannah"
AND s3.forename = "Helen"
)
ORDER BY t.address
, t.familyname
, t.forename
;
2nd solution:
SELECT t.address
, t.familyname
, t.forename
FROM yourTable t
WHERE EXISTS
-- search subquery
( SELECT *
FROM yourTable s1
JOIN yourTable s2
ON s2.address = s1.address
JOIN yourTable s3
ON s3.address = s1.address
WHERE s1.forename = "Julia"
AND s2.forename = "Hannah"
AND s3.forename = "Helen"
AND s1.address = t.address
)
ORDER BY t.address
, t.familyname
, t.forename
;
3rd solution:
SELECT t.address
, t.familyname
, t.forename
FROM yourTable t
WHERE -- search subqueries
EXISTS
( SELECT *
FROM yourTable s1
WHERE s1.forename= "Julia"
AND s1.address = t.address
)
AND EXISTS
( SELECT *
FROM yourTable s2
WHERE s2.forename = "Hannah"
AND s2.address = t.address
)
AND EXISTS
( SELECT *
FROM yourTable s3
WHERE s3.forename = "Helen"
AND s3.address = t.address
)
ORDER BY t.address
, t.familyname
, t.forename
;
DECLARE @namecount int;
DECLARE @forenames TABLE (name varchar(50));
INSERT INTO @forenames
VALUES ('...'),
('...'),
('...');
SELECT @namecount = COUNT(*) FROM @forenames;
/* list all people by addresses that are shared by people
whose forenames are included in @forenames */
SELECT cd.*
FROM CensusData cd
INNER JOIN (
SELECT d.Address
FROM CensusData d
INNER JOIN @forenames f ON d.Forename = f.Name
GROUP BY d.Address
HAVING COUNT(DISTINCT d.Forename) >= @namecount
) filter ON cd.Address = filter.Address
/* same for family names */
SELECT cd.*
FROM CensusData cd
INNER JOIN (
SELECT d.[Family Name]
FROM CensusData d
INNER JOIN @forenames f ON d.Forename = f.Name
GROUP BY d.Address
HAVING COUNT(DISTINCT d.Forename) >= @namecount
) filter ON cd.[Family Name]= filter.[Family Name]
/* and so on fro other criteria */
You can also combine criteria, if needed.
select *
from dataset
where address = (
select address
from dataset
where famly_name = 'Hannah' and forename = 'Gamble'
)
You could start with something like this assuming address is to be matched exactly. It needs to be tweaked as you would most likely see some dups
Select
T1.*
From
TableName T1
Inner Join
TableName T2
On
T1.Address = T2.Address
Where
T1.ForeName = 'Julia'
精彩评论