Optimizing a SQL query to remove a cursor
I am trying to write a query that will go through a table and apply any credits that are on an account to the oldest balance. I could not figure out a way to do this without using a cursor, and I know cursors should be avoided at all costs if possible, so I am coming here for help.
select * into #balances from [IDAT_AR_BALANCES] where amount > 0
select * into #credits from [IDAT_AR_BALANCES] where amount < 0
create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)
declare credit_cursor cursor for
select [CLIENT_ID], amount, cvtGUID from #credits
open credit_cursor
declare @client_id varchar(11)
declare @credit money
declare @balance money
declare @cvtGuidBalance uniqueidentifier
declare @cvtGuidCredit uniqueidentifier
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
while @@fetch_status = 0
begin
while(@credit < 0 and (select count(*) from #balances where @client_id = CLIENT_ID and amount <> 0) > 0)
begin
select top 1 @balance = amount, @cvtGuidBalance = cvtGuid from #balances where @client_id = CLIENT_ID and amount <> 0 order by AGING_DATE
set @credit = @balance + @credit
if(@credit > 0)
begin
update #balances set amount = @credit where cvtGuid = @cvtGuidBalance
set @credit = 0
end
else
begin
update #balances set amount = 0 where cvtGuid = @cvtGuidBalance
end
end
update #credits set amount = @credit where cvtGuid = @cvtGuidCredit
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
end
close credit_cursor
deallocate credit_cursor
delete #balances where AMOUNT = 0
delete #credits where AMOUNT = 0
truncate table [IDAT_AR_BALANCES]
insert [IDAT_AR_BALANCES] select * from #balances
insert [IDAT_AR_BALANCES] select * from #credits
drop table #balances
drop table #credits
In my t开发者_如何学Cest cases for 10000 records and 1000 clients it take 26 seconds to run, by adding the two indexes on CLIENT_ID I was able to bring down the number to 14 seconds. However this is still too slow for what I need, the final result could have as many as 10000 clients and over 4,000,000 records so the run time could easily become in the double digit minutes.
Any recommendations on how I can restructure this to remove the cursor would be greatly appreciated.
Example (updated to show that you could have multiple credits after it has been run):
before
cvtGuid client_id ammount AGING_DATE
xxxxxx 1 20.00 1/1/2011
xxxxxx 1 30.00 1/2/2011
xxxxxx 1 -10.00 1/3/2011
xxxxxx 1 5.00 1/4/2011
xxxxxx 2 20.00 1/1/2011
xxxxxx 2 15.00 1/2/2011
xxxxxx 2 -40.00 1/3/2011
xxxxxx 2 5.00 1/4/2011
xxxxxx 3 10.00 1/1/2011
xxxxxx 3 -20.00 1/2/2011
xxxxxx 3 5.00 1/3/2011
xxxxxx 3 -8.00 1/4/2011
after
cvtGuid client_id ammount AGING_DATE
xxxxxx 1 10.00 1/1/2011
xxxxxx 1 30.00 1/2/2011
xxxxxx 1 5.00 1/4/2011
xxxxxx 3 -5.00 1/2/2011
xxxxxx 3 -8.00 1/4/2011
so it will apply the negative credit to the oldest positive balance (client 1 in the example), if there are no remaining positive balances after it is done it leaves the remaining negative (client 3), if they perfectly cancel out (this is the case 90% of the time with the real data) it will totally remove the record (client 2).
It is possible to solve this with the help of a recursive CTE.
The basic idea is this:
Get the totals of positive and negative values separately for every account (
client_id
).Iterate over every account and 'pinch off' an amount of one of the two totals, depending on
amount
's sign and absolute value (i.e. never 'pinch off' of the corresponding total more than its current value). The same value should be added to/subtracted fromamount
.After update, delete those rows where
amount
has become 0.
For my solution I've borrowed Lieven's table variable definition (thank you!), adding one column (cvtGuid
, declared as int
for the purpose of the demonstration) and one row (the last one from the original example, which was missing from Lieven's script).
/* preparing the demonstration data */
DECLARE @IDAT_AR_BALANCES TABLE (
cvtGuid int IDENTITY,
client_id INTEGER
, amount FLOAT
, date DATE
);
INSERT INTO @IDAT_AR_BALANCES
SELECT 1, 20.00, '1/1/2011'
UNION ALL SELECT 1, 30.00, '1/2/2011'
UNION ALL SELECT 1, -10.00, '1/3/2011'
UNION ALL SELECT 1, 5.00, '1/4/2011'
UNION ALL SELECT 2, 20.00, '1/1/2011'
UNION ALL SELECT 2, 15.00, '1/2/2011'
UNION ALL SELECT 2, -40.00, '1/3/2011'
UNION ALL SELECT 2, 5.00, '1/4/2011'
UNION ALL SELECT 3, 10.00, '1/1/2011'
UNION ALL SELECT 3, -20.00, '1/2/2011'
UNION ALL SELECT 3, 5.00, '1/3/2011'
UNION ALL SELECT 3, -8.00, '1/4/2011';
/* checking the original contents */
SELECT * FROM @IDAT_AR_BALANCES;
/* getting on with the job: */
WITH totals AS (
SELECT
/* 1) preparing the totals */
client_id,
total_pos = SUM(CASE WHEN amount > 0 THEN amount END),
total_neg = SUM(CASE WHEN amount < 0 THEN amount END)
FROM @IDAT_AR_BALANCES
GROUP BY client_id
),
refined AS (
/* 2) refining the original data with auxiliary columns:
* rownum - row numbers (unique within accounts);
* amount_to_discard_pos - the amount to discard `amount` completely if it's negative;
* amount_to_discard_neg - the amount to discard `amount` completely if it's positive
*/
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY date),
amount_to_discard_pos = CAST(CASE WHEN amount < 0 THEN -amount ELSE 0 END AS float),
amount_to_discard_neg = CAST(CASE WHEN amount > 0 THEN -amount ELSE 0 END AS float)
FROM @IDAT_AR_BALANCES
),
prepared AS (
/* 3) preparing the final table (using a recursive CTE) */
SELECT
cvtGuid = CAST(NULL AS int),
client_id,
amount = CAST(NULL AS float),
date = CAST(NULL AS date),
amount_update = CAST(NULL AS float),
running_balance_pos = total_pos,
running_balance_neg = total_neg,
rownum = CAST(0 AS bigint)
FROM totals
UNION ALL
SELECT
n.cvtGuid,
n.client_id,
n.amount,
n.date,
amount_update = CAST(
CASE
WHEN n.amount_to_discard_pos < p.running_balance_pos
THEN n.amount_to_discard_pos
ELSE p.running_balance_pos
END
+
CASE
WHEN n.amount_to_discard_neg > p.running_balance_neg
THEN n.amount_to_discard_neg
ELSE p.running_balance_neg
END
AS float),
running_balance_pos = CAST(p.running_balance_pos -
CASE
WHEN n.amount_to_discard_pos < p.running_balance_pos
THEN n.amount_to_discard_pos
ELSE p.running_balance_pos
END
AS float),
running_balance_neg = CAST(p.running_balance_neg -
CASE
WHEN n.amount_to_discard_neg > p.running_balance_neg
THEN n.amount_to_discard_neg
ELSE p.running_balance_neg
END
AS float),
n.rownum
FROM refined n
INNER JOIN prepared p ON n.client_id = p.client_id AND n.rownum = p.rownum + 1
)
/* -- some junk that I've forgotten to clean up,
SELECT * -- which you might actually want to use
FROM prepared -- to view the final prepared result set
WHERE rownum > 0 -- before actually running the update
ORDER BY client_id, rownum
*/
/* performing the update */
UPDATE t
SET amount = t.amount + u.amount_update
FROM @IDAT_AR_BALANCES t INNER JOIN prepared u ON t.cvtGuid = u.cvtGuid
OPTION (MAXRECURSION 0);
/* checking the contents after UPDATE */
SELECT * FROM @IDAT_AR_BALANCES;
/* deleting the eliminated amounts */
DELETE FROM @IDAT_AR_BALANCES WHERE amount = 0;
/* checking the contents after DELETE */
SELECT * FROM @IDAT_AR_BALANCES;
UPDATE
As Lieven has correctly suggested (thank you again!), you can delete all the rows from the accounts where amount
adds up to 0 first, then update the other rows. That will increase the overall performance, since, as you say, the majority of the data have their amounts adding up to 0.
Here's a variation on Lieven's solution for deleting 'zero accounts':
DELETE FROM @IDAT_AR_BALANCES
WHERE client_id IN (
SELECT client_id
FROM @IDAT_AR_BALANCES
GROUP BY client_id
HAVING SUM(amount) = 0
)
Keep in mind, though, that the DELETE
after the update will still be needed too, because the update may reset some of the amount
values to 0. If I were you, I might consider creating a trigger FOR UPDATE that would automatically delete the rows where amount = 0
. Such a solution is not always acceptable, but sometimes is fine. It depends on what else you can possibly do with your data. It may also depend on whether it's solely your project or there are other maintainers as well (who don't like rows 'magically' and unexpectedly disappearing).
I put together something very similar recently. I didn't find a really simple solution, it ended up requiring a couple hundred lines, but I can offer a couple of points.
You can put your credits into a table with a serial number for each client:
CREATE TABLE #CreditsInSequence
(
Client_ID INT NOT NULL,
Sequence INT NOT NULL,
PRIMARY KEY (ClientID, Sequence),
Date DATE NOT NULL,
Amount DECIMAL NOT NULL
)
INSERT INTO #CreditsInSequence (Client_ID, Sequence, Date, Amount)
SELECT
client_id, ROW_NUMBER (PARTITION BY client_id, ORDER BY date) AS Sequence, date, amount
FROM
#credits
If a client has just one credit, they'll have one row in the table, with Sequence = 1. If another client has three credits, they will have three rows, with sequence numbers 1, 2, and 3. You can now loop through this temp table, and you'll only need a number of iterations equal to the most credits that any single client has.
DECLARE @MaxSeq INT = (SELECT MAX(Sequence) FROM #Credits)
DECLARE @Seq INT = 1
WHILE @Seq <= @MaxSeq
BEGIN
-- Do something with this set of credits
SELECT
Client_ID, Date, Amount
FROM
#CreditsInSequence
WHERE
Sequence = @Seq
SET @Seq += 1 -- Don't forget to increment the loop!
END
Just as with your cursor, this lets you operate sequentially, fully handling the first credit for each client before moving on to the second. As a bonus, in my experience this kind of "pretend FOR loop" is often faster than a cursor.
To identify the right balance to which to apply each credit, I'd start with something like this:
SELECT
B.client_id,
MIN(B.date) AS Date,
B.amount - COALESCE(AC.Amount, 0.00) AS MaxAmountCreditable
FROM
#balances AS B
LEFT JOIN #AllocatedCredits AS AC ON B.BalanceID = AC.BalanceID
WHERE
B.amount + COALESCE(AC.Amount, 0.00) > 0.00
GROUP BY
B.client_id
You'll need to extend this last query to get the actual balance ID (cvtGuid, if I'm reading your table right) from that date, record these allocations in #AllocatedCredits, handle cases where the credit is enough to pay off multiple balances, etc.
Good luck, and don't hesitate to come back to SO if you need any help!
You'll have to verify if it would be faster but this is done with (mostly) set based operations instead of cursor based.
Test data
DECLARE @IDAT_AR_BALANCES TABLE (
client_id INTEGER
, amount FLOAT
, date DATE
)
INSERT INTO @IDAT_AR_BALANCES
SELECT 1, 20.00, '1/1/2011'
UNION ALL SELECT 1, 30.00, '1/2/2011'
UNION ALL SELECT 1, -10.00, '1/3/2011'
UNION ALL SELECT 1, 5.00, '1/4/2011'
UNION ALL SELECT 2, 20.00, '1/1/2011'
UNION ALL SELECT 2, 15.00, '1/2/2011'
UNION ALL SELECT 2, -40.00, '1/3/2011'
UNION ALL SELECT 2, 5.00, '1/4/2011'
UNION ALL SELECT 3, 10.00, '1/1/2011'
UNION ALL SELECT 3, -20.00, '1/2/2011'
UNION ALL SELECT 3, 5.00, '1/3/2011'
Delete all that adds up to 0 (90% of data)
DELETE FROM @IDAT_AR_BALANCES
FROM @IDAT_AR_BALANCES b
INNER JOIN (
SELECT client_id
FROM @IDAT_AR_BALANCES
GROUP BY
client_id
HAVING SUM(amount) = 0
) bd ON bd.client_id = b.client_id
Remaining records
DECLARE @Oldest TABLE (
client_id INTEGER PRIMARY KEY CLUSTERED
, date DATE
)
DECLARE @Negative TABLE (
client_id INTEGER PRIMARY KEY CLUSTERED
, amount FLOAT
)
WHILE EXISTS ( SELECT b.client_id
, MIN(b.amount)
FROM @IDAT_AR_BALANCES b
INNER JOIN (
SELECT client_id
FROM @IDAT_AR_BALANCES
GROUP BY
client_id
HAVING COUNT(*) > 1
) r ON r.client_id = b.client_id
WHERE b.amount < 0
GROUP BY
b.client_id
HAVING COUNT(*) > 0
)
BEGIN
DELETE FROM @Oldest
DELETE FROM @Negative
INSERT INTO @Oldest
SELECT client_id
, date = MIN(date)
FROM @IDAT_AR_BALANCES
WHERE amount > 0
GROUP BY
client_id
INSERT INTO @Negative
SELECT b.client_id
, amount = SUM(amount)
FROM @IDAT_AR_BALANCES b
LEFT OUTER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
WHERE amount < 0
AND o.client_id IS NULL
GROUP BY
b.client_id
UPDATE @IDAT_AR_BALANCES
SET b.amount = b.amount + n.amount
FROM @IDAT_AR_BALANCES b
INNER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
INNER JOIN @Negative n ON n.client_id = b.client_id
DELETE FROM @IDAT_AR_BALANCES
FROM @IDAT_AR_BALANCES b
LEFT OUTER JOIN @Oldest o ON o.client_id = b.client_id AND o.date = b.date
INNER JOIN (
SELECT client_id
FROM @IDAT_AR_BALANCES
GROUP BY
client_id
HAVING COUNT(*) > 1
) r ON r.client_id = b.client_id
WHERE amount < 0
AND o.client_id IS NULL
END
DELETE FROM @IDAT_AR_BALANCES
WHERE amount = 0
SELECT *
FROM @IDAT_AR_BALANCES
First, as you state, you should only be dealing with those clients that have balances.
Second, you can simulate a cursors functionality with a WHILE loop ..
here are the modifications to the code. I left outer the guts of the calculations as they are not the issue... if you would like me to complete the code, let me know
--first, only deal with those clients with balances
select CLIENT_ID into #ToDoList
from [IDAT_AR_BALANCES]
group by CLIENT_ID
having sum(amount)!=0
--next, get the temp debit and credit tables just for the clients you are working on
select * into #balances from [IDAT_AR_BALANCES] where amount > 0 and CLIENT_ID IN (SELECT CLIENT_ID FROM #ToDoList)
select * into #credits from [IDAT_AR_BALANCES] where amount < 0 and CLIENT_ID IN (SELECT CLIENT_ID FROM #ToDoList)
--fine
create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)
--simulate a cursor... but much less resource intensive
declare @client_id varchar(11)
-- now loop through each client and perform their aging
while exists (select * from #ToDoList)
begin
select top 1 @client_id = CLIENT_ID from #ToDoList
--perform your debit to credit matching and account aging here, per client
delete from #TodoList where Client_ID=@client_ID
end
--clean up.. drop temp tables, etc
One last thought... I actually did write this very code for a large Pest Control CRM I developed a few years ago... and I found that the most effective solution for this problem was.... a .NET CLR Stored Proc.
While I usually avoid CLR Procs at all costs.. there ARE times when they outperform SQL. In this case, procedural (row by row) queries with mathematical computations can be much faster in a CLR proc.
In my case, it was significantly faster than SQL.
FYI
精彩评论