What is wrong with this join of a table to itself?

2023-01-30 11:36 问答作者：

I have a table called TempAllAddresses with the following columns - ID, Address, State开发者_运维百科. I want to populate a new table with Address, State, and Count. Count should represent how many records there are in the TempAllAddresses table that have an address like this address followed by a wildcard. If that made no sense, here's an example to illustrate - Let's say I have a record like this:

ID      Address      State
12345   13 Phoenix   NY

What I want to do is insert a new record into a new table called AddressCount that has 13 Phoenix for the Address, NY for the State, and the number of records in the table that have NY as the State and an address LIKE '13 Phoenix%' for the Count.

I want to accomplish this with an inner join of TempAllAddresses on itself. This is what I've tried, but it doesn't seem to accomplish what I'm looking for:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

The Count is definitely off, though. It should be equivalent to running "SELECT COUNT(*) FROM TempAllAddresses WHERE State=thisRecordsState and Address LIKE thisRecordsAddress + '%'". How can I accomplish this? What am I doing wrong?

Edit:

The count seems to be off in the following way - If I have a record like I mentioned above, and then I have 2 other records that also have a state of NY, and then have addresses of "13 Phoenix Road" and "13 Phoenix Rd", then I want to get in my final table a record like this:

13 Phoenix    NY    3

Instead, I seem to be getting:

13 Phoenix    NY    9

I'm not quite sure what's happening here... some sort of cartesian product? Permutations...? Can anyone explain this?

Edit 2: A further edit since I seem to be misunderstood (and really need a solution :( )... Here is a query with a correlated subselect that accomplishes what I'm looking for. I'd like to do the same thing with an inner join of the table on itself rather than a subselect.

SELECT Address, State, 
    (SELECT Count(*)
    FROM TempAllAddresses innerQry 
    WHERE innerQry.address LIKE outerQry.address + '%' 
        AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry

Basically, for each record, I want to get the number of records in the table that have the same state and an address that begins with this address (or is equal to... I do want to include this address as part of the count).

Here's two solutions, one using a CROSS APPLY and the other using an INNER JOIN like you wanted originally. I hope this helps. :)

DECLARE @TempAllAddresses TABLE
(
    ID INT PRIMARY KEY IDENTITY(1, 1) NOT NULL
    , [Address] VARCHAR(250) NOT NULL
    , [State] CHAR(2) NOT NULL
)

INSERT INTO @TempAllAddresses
VALUES ('13 Phoenix', 'NY')
        , ('13 Phoenix St', 'NY')
        , ('13 Phoenix Street', 'NY')
        , ('1845 Test', 'TN')
        , ('1337 Street', 'WA')
        , ('1845 T', 'TN')

SELECT
    TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]
    , TempAddressesCounted.AddressCount
FROM @TempAllAddresses TempAddresses
CROSS APPLY
(
    SELECT
        COUNT(*) AS AddressCount
    FROM @TempAllAddresses TempAddressesApply
    WHERE TempAddressesApply.[Address] LIKE (TempAddresses.[Address] + '%')
        AND TempAddressesApply.[State] = TempAddresses.[State]
) TempAddressesCounted

SELECT
    TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]
    , COUNT(*) AS AddressCount
FROM @TempAllAddresses TempAddresses
INNER JOIN @TempAllAddresses TempAddressesJoin
    ON TempAddressesJoin.[Address] LIKE (TempAddresses.[Address] + '%')
            AND TempAddressesJoin.[State] = TempAddresses.[State]
GROUP BY TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]

Try this instead:

SELECT Orig_Address, State, COUNT(Similar_Address)
From
(
  SELECT t1.Address Orig_Address, 
         t1.State   State, 
         t2.address Similar_Address
    FROM TempAllAddresses t1
   INNER JOIN TempAllAddresses t2
      ON t1.state = t2.state
     AND T2.Address LIKE t1.address + '%'
     AND t1.address <> t2.address
)
GROUP BY State, Orig_Address

EDIT: forgot to include the difference between t1.address and t2.address, as @Spiny Norman said, since you probably do not want to compare an address to itself.

HTH

EDIT: [snip old stuff]

Try this:

SELECT t1.Address, t1.State, COUNT(distinct t2.id) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

QUERY A:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

is not equivalent to

QUERY B:

SELECT Address, State, 
    (SELECT Count(*)
    FROM TempAllAddresses innerQry 
    WHERE innerQry.address LIKE outerQry.address + '%' 
        AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry

because B produces 1 row for each row in the original table (TempAllAddresses), whereas A will group together rows in the original table that have the same state and address. To solve this, GROUP BY t1.ID, t1.State, t1.Address instead.

There is double counting going on when there are multiple rows with exactly the same address.

Try:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM (select distinct Address, State from TempAllAddresses) t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

Nested GroupBy:

The subquery will find the shortest address for each distinct address.
This doesn't consider case sensitivity.
Then each version of these addresses are counted.

SQL:

SELECT Address, State, count(1) As NumEntities
FROM ( 
    SELECT min(t1.Address) as Address, t1.State
    FROM TempAllAddresses t1
    INNER JOIN TempAllAddresses t2
     ON t1.state = t2.state
     AND T2.Address LIKE t1.address + '%'
    GROUP BY t1.State, t2.Address
) GROUP By State, Address

Have you tried analytical functions - they are often the easiest solution. I am not familiar with your table structure, but it should be something like this:

SELECT t1.Address, t1.State, 
COUNT(t2.address) OVER (PARTITION BY t2.state) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

You can even add ORDER BY in the OVER clause. See Oracle FAQs for some explanation.

继续阅读：join self-join sql

What is wrong with this join of a table to itself?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？