SQL Server - Select Distinct Alternative Query

2023-03-14 09:23 问答作者：

I asked a question earlier about a replacement to using SELECT DISTINCT as the query takes a lot longer to execute. I was recommended to use EXISTS and it worked A LOT better (0 second execution versus the previous 44 second execution. I'm not too familiar with query syntax but am learning. I was hoping someone might be able to rephrase the following query without using DISTINCT and preferably using EXISTS (as it worked so well previously). I appreciate any help.

select distinct EM.Employee, 
                rtrim(EM.First开发者_开发问答Name) + ' ' + rtrim(EM.LastName) as Name 
from EM EM 
inner join PR PR 
    on EM.Employee = PR.ProjMgr 
where PR.WTS1 in (Select distinct WTS1 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '') 
    and PR.WTS2 = '' 
order by Name

When you JOIN, you get a partial Cartesian product. The partial comes from the INNER JOIN condition.

So for 2 rows in EM that have 3 rows and 4 rows in PR respectively, you'd get 7 rows in the output. As expected. You asked "give me all partial Cartesian product of EM and PR where matching rows"

However, you want "give me rows in EM where there is something in PR". So the INNER JOIN is the wrong construct.

You can use IN, EXISTS (or INTERSECT in other situations) and they are all semantically correct for this latter question.

In this case your use of IN is in the wrong place. As I said before, the whole "test" condition should be pushed into the subquery.

So, both of these are correct

One table in the FROM clause
All conditions in the subquery
No DISTINCT

Stop JOINing!

select
    EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from
    EM EM 
WHERE
   EXISTS (SELECT *
       FROM
          PR PR 
          JOIN
          TabFields TF ON PR.WTS1 = TF.WTS1
       WHERE
          PR.WTS2 = '' AND
          TF.custInclude = 'Y' and TF.WBT2 = '' AND
          EM.Employee = PR.ProjMgr
          )

select
    EM.Employee, rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from
    EM EM 
WHERE
   EM.Employee IN (SELECT PR.ProjMgr
       FROM
          PR PR 
          JOIN
          TabFields TF ON PR.WTS1 = TF.WTS1
       WHERE
          PR.WTS2 = '' AND
          TF.custInclude = 'Y' and TF.WBT2 = ''
       )

Using 2 INs which is closer to your original query:

select EM.Employee, 
                rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from EM EM 
WHERE 
      EM.Employee IN (SELECT PR.ProjMgr
          FROM 
            PR PR 
          where PR.WTS1 in (Select distinct WTS1 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '') 
            and PR.WTS2 = ''
       ) 
order by Name

The (new) Query optimizer will treat IN the same as EXISTS if you are evaluating a subquery.

Since you are using IN, you don't need the DISTINCT in your subquery. Just take it out!

It looks like someone is writing queries and always using DISTINCT which is a pretty terrible habit to get into. DISTINCT should only be used to avoid duplicates. There's no reason to eliminate dupes in your subquery since it will be just as accurate to say that 1 is in 1,2,1,3 as it is to say 1 is in 1,2,3.

No distinct is needed in the code below

select  EM.Employee, 
                rtrim(EM.FirstName) + ' ' + rtrim(EM.LastName) as Name 
from EM 
inner join PR 
    on EM.Employee = PR.ProjMgr 
where exists (Select * 
                  from TabFields 
                  where custInclude = 'Y' and WTS2 = '' and PR.WTS1 = tabfields.WTS1 ) 
    and PR.WTS2 = '' -- Comment: Check if this clause is covered by the subquery and can be ommited
order by Name

Try using GROUP BY instead of select DISTINCT. Also you can move your sub-query from the WHERE clause and use it as a derived table in a join.

SELECT EM.Employee, 
       RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName) AS Name 
FROM EM EM 
INNER JOIN PR PR 
    ON EM.Employee = PR.ProjMgr 
INNER JOIN (SELECT WTS1 
              FROM TabFields 
             WHERE custInclude = 'Y' AND WTS2 = ''
             GROUP BY WTS1) x
    ON PR.WTS1 = x.WTS1
    AND PR.WTS2 = '' 
GROUP BY EM.Employee,    RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName) 
ORDER BY RTRIM(EM.FirstName) + ' ' + RTRIM(EM.LastName)

This query seems fine to me. Is is running slow? Do you really need to trim first and last name? If not you can ignore them.

 select distinct E.Employee, rtrim(E.FirstName) + ' ' + rtrim(E.LastName) as Name 
 from EM E join PR P on E.Employee = P.ProjMgr 
 where P.WTS1 in (Select WTS1 from TabFields where custInclude = 'Y' and WTS2 = '') 
   and P.WTS2 = '' 
 order by Name

Exists() can be used to check something as a condition before some other sqls executed. If you need distinct rows, exists() will not help.

继续阅读：distinct select sql-server

SQL Server - Select Distinct Alternative Query

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？