开发者

Why does SQL cost explode with simple "or"?

I have the following statement to find unambiguous names in my data (~1 Million entries):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boost开发者_如何学运维s performance to 0,5 s and costs to 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.


The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.

EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.

No JOIN occurs in the second query as you are only retrieving records from one table.


The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.

The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)


I would consider testing the query rewritten as below... Do a direct join from one to the other on the criteria that "Qualifies" what IS considered a match... Then, in the WHERE clause, throw it out if it doesn't come up with a match

select 
      p1.Prename, 
      p1.Surname
   from 
      person p1 
         join person p2
            on p1.ID <> p2.ID
            and (  p1.Surname = p2.Surname
                or p1.SurName = p2.AltName )
            and p2.PreName like concat( concat( '%', p1.Prename ), '%' )
   where
          p1.PreName is not null
      and p1.SurName is not null
      and p1.Inv_date is null
      and p2.id is null

Per your comments, but from what it appears you were looking for... NO, do NOT do a left outer join... If you are looking for names that are ALIKE that you want to PURGE out (however you'll handle that), you only want to PREQUALIFY those records that DO HAVE A MATCH via the self-join (hence normal join). If you have a name that DOES NOT have a similar name, you probably want to leave it alone... thus it will automatically be left OUT of the result set.

Now, the WHERE clause kicks in... You have a valid person on the left... that HAS a person on the right.. These ARE the duplicates... so you have the match, now by throwing in the logical "p2.ID IS NULL" creates the same result as NOT EXIST giving the final results.

I put my query back to a normal "join".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜