20 hour query from hell
i am running this query on a table that has half a million records with about 7 fields:
delete from qvalues where rid not in
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1)
and rid not in (select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)=1);
why is it taking 开发者_Go百科SO LONG?
what can i do to optimize it?
im running sql server 2008
Your best bet is to look at the execution plan and see what's taking the longest. I'd start by reducing the two not in
queries to one:
delete
from qvalues
where rid not in
(
select min(rid)
from qvalues
group by name, compound, rowid
having count(1) >= 1
)
You might also want to look into putting an index on name
, compound
and rowid
As well as considering batching and indexing you can also rewrite the query itself to remove the sub queries and be more efficient.
;WITH cte As
(
SELECT ROW_NUMBER() OVER (PARTITION BY name, compound, rowid ORDER BY rid) AS RN
FROM qvalues
)
DELETE FROM cte WHERE RN > 1
How many duplicates per group will there be likely to be? If many it might be quicker to do an insert of the records to keep into a new table and then a drop and rename.
Not knowing the actual data involved, I can just give some general advice: Run each of the subqueries individually.
Also, am I reading this wrong, or are you deleting all but 2 entries from this table (if rid is unique)?
1 - Use batching. This lets you resume, and gives you an idea of progress. As an example:
DECLARE @MSG Varchar(max)
WHILE 1=1
BEGIN
DELETE TOP (100000) qvalues
FROM qvalues WITH (TABLOCKX)
<logic here>
IF @@ROWCOUNT < 100000 BREAK
SET @Msg = 'Deleted another 10 Million'
SET @Msg = @Msg + ' ' +CONVERT(varchar(20),GETDATE(),101)+' '+CONVERT(varchar(20),GETDATE(),108)
RAISERROR(@Msg, 0, 1) WITH NOWAIT
END
Note that I also added a WITH (TABLOCKX)
hint, which puts a table lock on and eliminates row-level locking. It'll cause issues with concurrent reads but hopefully you don't have anything else querying that table while you are deleting.
2 - Fix your logic This is impossible to write for you without a better idea of your table structure, but some options are:
- Materialize a table with the values you want to compare against and do a join. If the delete is big enough you can make a clustered index on the temp table on the join field. I've used this a lot with great success.
- If you expect to delete a large portion of the records, SELECT INTO
a new table and drop the old one. This is a minimally logged operation and runs really quickly on SQL Server 2008 compared to a delete, which needs to log the values for each row.
- Drop all your indexes but what you are using for selection and your clustered index. Keeping a clustered index is normally OK for a delete of this type if it's a relevant cluster to the query.
first thought:
delete from qvalues where rid not in
(
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)>1
UNION
select min(rid) from qvalues
group by name,compound,rowid
having COUNT(*)=1);
Maybe its also a good idea to ensure, that the sql-server knows, that you are doing a "uncorrelated subselect" (because "correlated subselects" take much longer):
delete from qvalues a where a.rid not in
(
select min(b.rid) from qvalues b
group by b.name,b.compound,b.rowid
having COUNT(*)>1
UNION
select min(c.rid) from qvalues c
group by c.name,c.compound,c.rowid
having COUNT(*)=1);
and of course you should consider to use indexes (especially on rid, but also on name, compound, rowid)
My SQLs are not tested - I hope you get the idea of what I was trying to show.
PS: your sql requires a lot of calculations (especially the HAVING
clauses), could you try to find another solution for your problem?
What is your machines setup, do you have enough memory, where do you see the most utilisation while the query runs (CPU, Memory, Disk IO) ?
精彩评论