How to reducing impact on OLTP when performing ETL Process
I am new in designing a ETL process. Currently I have two database, one is the live database where the application use it for every day transaction. The other one is the data warehouse.
I have a table in the live database that regularly have new data insert into it. The goal is that every night the ETL Process will transfer the data in the live database to the data warehouse, follow by deleting the data in the live database.
Due to my lack of knowledge, the solution that I got is to implement something call a rolling table. Basically on the live database, I have two tables that have the same structure. I call them tblLive1 and tblLive2. I also has a synonym call tblLive. All insert is done on the synonym. The synonym would point at one of the table.
When I run the ETL process, I have a stored procedure that would drop and create a new synonym that point to tblLive2. This allow the ETL process to transform data from tblLive1 witho开发者_如何学Gout effecting the application. The assumption is that the ETL Process takes an hour to run, and I won't want the ETL process lock the table preventing the application insert new data to it.
This solution should theoretically work, but not elegant.
I am sure this problem is a common problem, are there any other solutions out there?
To add to Bob's answer (above), It is usual in DWH/BI applications, that all necessary tables are essentially copied into a "staging" database or a "staging" schema on your DWH database(depending on the number of tables / size etc). These would ordinarily be on a different server to your OLTP system - for a DWH implementation of any size that is)
To answer the question on performance impact, it depends on your server spec/io configuration.
Is data being inserted into the OLTP system 24hours/day? or are there downtimes? or low traffic times?
It might be worthwhile using database compression as IO is going to be your biggest enemy and this will help considerably.
Read the table into a staging area and process the staging table. You usually want to spend as little time on the production system as you have too. Especially if it is in use.
You may also want to look into using tables loaded by a trigger. Or Change Data Capture if you are on SQL 2008
精彩评论