Query Performance help

2022-12-11 11:30 问答作者：

I have a long running job. The records to be processed are in a table with aroun 100K records.

Now during whole job whenever this table is queried it queries against those 100K records. After processing status of every record is updated against same table.

I want to know, if it would be better if I add another table where I can update records status and in this table keep deleting whatever records are processed, so as the query go forward the no. of records in master table will decrease increasing the query performance.

EDIT: Master table is basically used for this load only. I receive a flat file, which I upload as it is before processing. After doing validations on this table I pick one record at a time and move data to appropriate syst开发者_运维百科em tables.

I had a similar performance problem where a table generally has a few million rows but I only need to process what has changed since the start of my last execution. In my target table I have an IDENTITY column so when my batch process begins, I get the highest IDENTITY value from the set I select where the IDs are greater than my previous batch execution. Then upon successful completion of the batch job, I add a record to a separate table indicating this highest IDENTITY value which was successfully processed and use this as the start input for the next batch invocation. (I'll also add that my bookmark table is general purpose so I have multiple different jobs using it each with unique job names.)

If you are experiencing locking issues because your processing time per record takes a long time you can use the approach I used above, but break your sets into 1,000 rows (or whatever row chunk size your system can process in a timely fashion) so you're only locking smaller sets at any given time.

Few pointers (my two cents):

Consider splitting that table similar to "slowly changing dimension" technique into few "intermediate" tables, depending on "system table" destination; then bulk load your system tables -- instead of record by record.

Drop the "input" table before bulk load, and re-create to get rid of indexes, etc.

Do not assign unnecessary (keys) indexes on that table before load.

Consider switching the DB "recovery model" to bulk-load mode, not to log bulk transactions.

Can you use a SSIS (ETL) task for loading, cleaning and validating?

UPDATE:
Here is a typical ETL scenario -- well, depends on who you talk to.

1. Extract to flat_file_1 (you have that)
2. Clean flat_file_1 --> SSIS --> flat_file_2 (you can validate here)
3. Conform flat_file_2 --> SSIS --> flat_file_3 (apply all company standards)
4. Deliver flat_file_3 --> SSIS (bulk) --> db.ETL.StagingTables (several, one per your destination)
4B. insert into destination_table select * from db.ETL.StagingTable (bulk load your final destination)

This way if a process (1-4) times-out you can always start from the intermediate file. You can also inspect each stage and create report files from SSIS for each stage to control your data quality. Operations 1-3 are essentially slow; here they are happening outside of the database and can be done on a separate server. If you archive flat_file(1-3) you also have an audit trail of what's going on -- good for debug too. :)

继续阅读：performance sql sql-server sql-server-2005

Query Performance help

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？