开发者

Strange delete query, is it written correctly?

We recently had some issues where a sql script that would remove duplicate entries from the table would not use the most recent entry as the one to keep. I think this line is the issue

delete from vaccine_patient_details 
where vacc_pat_guid <> 
    (Select top 1 vac开发者_StackOverflow社区c_pat_guid 
     from vaccine_patient_details as v 
     where v.patient_guid = patient_guid and 
           v.vaccine_guid = vaccine_guid 
     order by date_given desc)

Is that correct syntax? I found another version of the script working on a different table. (names changed to match the first example)

delete from vaccine_patient_details 
where vacc_pat_guid <> 
    (Select top 1 vacc_pat_guid 
     from vaccine_patient_details as v 
     where v.patient_guid = vaccine_patient_details.patient_guid and 
           v.vaccine_guid = vaccine_patient_details.vaccine_guid 
     order by date_given desc)

This one uses the the table name of the deleted table in the inner where clause, can that be causing a problem in my first version?

Details about the Table:

  1. Any columns that end in guid are a datatype of uniqueidentifier
  2. vacc_pat_guid is the primary key and is unique.
  3. date_given is a datetime that could be null. If there is a duplicate where one is null and one is not null it should prefer the not null one.


Whithout any aliases on the first table, the query is equivalent to :

delete from vaccine_patient_details 
where vacc_pat_guid <> 
    (Select top 1 vacc_pat_guid 
     from vaccine_patient_details as v 
     where v.patient_guid = v.patient_guid and 
           v.vaccine_guid = v.vaccine_guid 
     order by date_given desc)

And a good one would be

delete v1 from vaccine_patient_details as v1
where v1.vacc_pat_guid <> 
    (Select top 1 v.vacc_pat_guid 
     from vaccine_patient_details as v 
     where v.patient_guid = v1.patient_guid and 
           v.vaccine_guid = v1.vaccine_guid 
     order by v.date_given desc)

By specifiying the table name in the second query you show us, the optimizer understand that he have to join with the first table, because the seconde table is named 'v', and the first is so 'vaccine_patient_details', and he is not confuse.

He is confused in the first because he doesn't know if patient_guid is the field in the first table or in the second one. So it takes the closer, so the second one.

Edit :

From http://dev.mysql.com/doc/refman/5.0/en/delete.html

If you declare an alias for a table, you must use the alias when referring to the table:

DELETE t1 FROM test AS t1, test2 WHERE ...


The relevant section of your code is (as you noted) this...

 where v.patient_guid = patient_guid and 
       v.vaccine_guid = vaccine_guid

The right hand side of the equality operators have no table specified. The optimiser will check the most local scope first for a matching table. In this case, the table in the sub-query has those fields, the optimised does not even check the tables in the outer query.

The second version of the code is explicit about which table to reference, which happens to be the table in the outer query.

So, in short, yes; The problem is that the first version implicitly references the inner query's instance of the table, and it should instead explicitly reference the outer query's instance of the table.

Note : I don't agree that this self-join is a problem.


delete a from vaccine_patient_details a, vaccine_patient_details b
where a.patient_guid = b.patient_guid
   and a.vaccine_guid = b.vaccine_guid
   and a.date_given < b.date_given


Whether it is legal or not to join with the same table, it is not a good idea. You would do better to extract the ids of the records you think you should be deleting into a separate table ( then you can verify that they are right ), and use these to run the deletes.

I think trying to do deletes based on a complex query like this is asking for trouble at some point.


Give this a try (in your dev environment, of course)

delete vaccine_patient_details 
  from vaccine_patient_details V
 where vacc_pat_guid <> 
    (Select top 1 vacc_pat_guid 
       from vaccine_patient_details 
      where V.patient_guid = patient_guid and 
            V.vaccine_guid = vaccine_guid 
      order by date_given desc)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜