What is wrong with this join of a table to itself?
I have a table called TempAllAddresses
with the following columns - ID
, Address
, State
开发者_运维百科. I want to populate a new table with Address
, State
, and Count
. Count
should represent how many records there are in the TempAllAddresses table that have an address like this address followed by a wildcard. If that made no sense, here's an example to illustrate -
Let's say I have a record like this:
ID Address State
12345 13 Phoenix NY
What I want to do is insert a new record into a new table called AddressCount
that has 13 Phoenix for the Address
, NY for the State
, and the number of records in the table that have NY as the State and an address LIKE '13 Phoenix%' for the Count
.
I want to accomplish this with an inner join of TempAllAddresses on itself. This is what I've tried, but it doesn't seem to accomplish what I'm looking for:
SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address
The Count is definitely off, though. It should be equivalent to running "SELECT COUNT(*) FROM TempAllAddresses WHERE State=thisRecordsState and Address LIKE thisRecordsAddress + '%'
". How can I accomplish this? What am I doing wrong?
Edit:
The count seems to be off in the following way - If I have a record like I mentioned above, and then I have 2 other records that also have a state of NY, and then have addresses of "13 Phoenix Road" and "13 Phoenix Rd", then I want to get in my final table a record like this:
13 Phoenix NY 3
Instead, I seem to be getting:
13 Phoenix NY 9
I'm not quite sure what's happening here... some sort of cartesian product? Permutations...? Can anyone explain this?
Edit 2: A further edit since I seem to be misunderstood (and really need a solution :( )... Here is a query with a correlated subselect that accomplishes what I'm looking for. I'd like to do the same thing with an inner join of the table on itself rather than a subselect.
SELECT Address, State,
(SELECT Count(*)
FROM TempAllAddresses innerQry
WHERE innerQry.address LIKE outerQry.address + '%'
AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry
Basically, for each record, I want to get the number of records in the table that have the same state and an address that begins with this address (or is equal to... I do want to include this address as part of the count).
Here's two solutions, one using a CROSS APPLY and the other using an INNER JOIN like you wanted originally. I hope this helps. :)
DECLARE @TempAllAddresses TABLE
(
ID INT PRIMARY KEY IDENTITY(1, 1) NOT NULL
, [Address] VARCHAR(250) NOT NULL
, [State] CHAR(2) NOT NULL
)
INSERT INTO @TempAllAddresses
VALUES ('13 Phoenix', 'NY')
, ('13 Phoenix St', 'NY')
, ('13 Phoenix Street', 'NY')
, ('1845 Test', 'TN')
, ('1337 Street', 'WA')
, ('1845 T', 'TN')
SELECT
TempAddresses.ID
, TempAddresses.[Address]
, TempAddresses.[State]
, TempAddressesCounted.AddressCount
FROM @TempAllAddresses TempAddresses
CROSS APPLY
(
SELECT
COUNT(*) AS AddressCount
FROM @TempAllAddresses TempAddressesApply
WHERE TempAddressesApply.[Address] LIKE (TempAddresses.[Address] + '%')
AND TempAddressesApply.[State] = TempAddresses.[State]
) TempAddressesCounted
SELECT
TempAddresses.ID
, TempAddresses.[Address]
, TempAddresses.[State]
, COUNT(*) AS AddressCount
FROM @TempAllAddresses TempAddresses
INNER JOIN @TempAllAddresses TempAddressesJoin
ON TempAddressesJoin.[Address] LIKE (TempAddresses.[Address] + '%')
AND TempAddressesJoin.[State] = TempAddresses.[State]
GROUP BY TempAddresses.ID
, TempAddresses.[Address]
, TempAddresses.[State]
Try this instead:
SELECT Orig_Address, State, COUNT(Similar_Address)
From
(
SELECT t1.Address Orig_Address,
t1.State State,
t2.address Similar_Address
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
AND t1.address <> t2.address
)
GROUP BY State, Orig_Address
EDIT: forgot to include the difference between t1.address and t2.address, as @Spiny Norman said, since you probably do not want to compare an address to itself.
HTH
EDIT: [snip old stuff]
Try this:
SELECT t1.Address, t1.State, COUNT(distinct t2.id) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address
QUERY A:
SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address
is not equivalent to
QUERY B:
SELECT Address, State,
(SELECT Count(*)
FROM TempAllAddresses innerQry
WHERE innerQry.address LIKE outerQry.address + '%'
AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry
because B produces 1 row for each row in the original table (TempAllAddresses
), whereas A will group together rows in the original table that have the same state and address. To solve this, GROUP BY t1.ID, t1.State, t1.Address
instead.
There is double counting going on when there are multiple rows with exactly the same address.
Try:
SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM (select distinct Address, State from TempAllAddresses) t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address
Nested GroupBy:
- The subquery will find the shortest address for each distinct address.
- This doesn't consider case sensitivity.
- Then each version of these addresses are counted.
SQL:
SELECT Address, State, count(1) As NumEntities
FROM (
SELECT min(t1.Address) as Address, t1.State
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t2.Address
) GROUP By State, Address
Have you tried analytical functions - they are often the easiest solution. I am not familiar with your table structure, but it should be something like this:
SELECT t1.Address, t1.State,
COUNT(t2.address) OVER (PARTITION BY t2.state) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address
You can even add ORDER BY
in the OVER
clause. See Oracle FAQs for some explanation.
精彩评论