开发者

Lengthy SQL stored procedure with tables storing more and more columns

This question applies to stored procedures written in SQL-92

i.e. Oracle PL/SQL, SQL Server T-SQL or DB2 SQL

I'm maintaining an 11000 line stored procedure.

I find that by the end of the stored procedure I need to report out 80 columns of data.

There are 3 distinct phases in this stored procedure.

  1. Data Gathering (copying data from the live tables into the stored procedure intermediate tables)

    I need to do data gathering for consistency because the LIVE data (i.e. in the member table) at line 30 might have changed by the time the stored procedure execution gets to line 10,000

    Commit state atomicity is maintained here (no commit until all the data needed is copied over)

  2. Calculation (lots of SQL, complicated enough so that Cursors or Views will not do the job)
  3. Writing back to permanent tables (invoices, AR, payments)

    Commit state atomicity is maintained here (no commit until all the data needed is copied over)

The "intermediate" tables are only used in the stored procedure.

They are indexed for joins down the line but do not have

PK/FK referential integrity constraints or unique indexes

as these would slow down execution considerably in addition

to pointing back to LIVE data (that is in flux)

When you get to 80 columns of data that you need to report by the end of a stored

procedure you run up against RDBMS limitations (index limits, memory limits,

SQL join COST limits, out of control paging and data goi开发者_运维技巧ng to virtual

memory/swap when the DB thinks it should use HASH instead of using NESTED LOOPS)

I have normalized LIVE data (that gets written to and read from 24/7 by data entry users)

It occurred to me that the way to optimize the space taken up by intermediate tables

used in the stored procedure (at step 2) would be to find composite primary keys and assign

each a unique id (a surrogate PK) thereby referencing n columns with 1 column. Then, I

would reconstitute this data at the end of step 2 and have it ready to

write back to at the start of step 3. This would add more processing to

step 2 but less data would get copied around. Also debugging would take

more steps (tracing back ids to actual data in the intermediate

table data after execution completes)

Has anyone run into this scenario with lengthy stored procedures?

Has anyone created a surrogate key (replacing a compound PK with a one-column PK) in

intermediate tables that are only used in stored procedures?

Has that paid off in terms of execution time and memory/space used during

execution?


And I thought my 1400 line procs were long!

I can see where a surrogate key would be faster in the joins than the composite PK. But with anything this complex, I think you would simply have to try both ways.

Can you reduce the 80 columns at all? I guesss I'm asking if you used select * with joins where the join fieds would be repeated in the query and could be dispensed with.


I've built a few lenghty SProcs and I've always gone for an Indentity column surrogate key. Is it possible to rethink what's being done and create separate temp tables for each intermediate step?

I've had to do this in the past. At the end, I "stitched" all the separate temp tables into my final output.


Why don't you try writing SSIS packages. Most of your calculations will be in SSIS memory w/o bothering database other than writing to temp tables.

You can break down your package as many pieces as you want and process will be more maintainable.

BTW, 11K sproc is insane... Can't help it, had to say that :-)


"I need to do data gathering for consistency because the LIVE data (i.e. in the member table) at line 30 might have changed by the time the stored procedure execution gets to line 10,000"

In Oracle you could look at DBMS_FLASHBACK (or SERIALIZABLE isolation level) for this level of consistency. Flashback queries might avoid you needing to copy all the data over.

I've done a similar exercise for a data migration - lots of temporary tables. One factor to check is that stats are gathered at appropriate points in time for the temporary tables. If those tables are normally empty then the stats might be screwing things up at the end.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜