SSIS ETL Design
SSIS ETL Design
I'm trying to build a ETL job. Below are my primary design considerations
- One master table to track data pull and st开发者_如何学Catus
Table Details
Table Name - JobStatus
- Columns - JobId, StartTime, EndTime, Status
Table Name - JobDetails
- Job_Detail_id, JobId, TableName, StartTime, EndTime, Status
Table Name - ErrorLog
- ErrorLogid, ErrorDesc, Time, Comments
How it Works
Whenever job fails, Job details table would help to figure out which is the last errored table, ETL job can start from that step
SSIS package would have DataFlow Tasks defined for each table between source and staging DB
Example of how it works
Put Entry in Job Status Table to start package execution
Each Dataflow task for individual table would put an entry in JobDetails table. After Loading data End Time and Status would be updated
On Error Status updated to Error and entry logged in Errorlog table
I would need your comments / feedback for this approach. Any bookmarks/blogposts for ETL design relevant to this example also would be helpful
I think that the requirements that you have mentioned about restarting the package execution from point of failure already exists in SSIS. You can use Checkpoint file to achieve that functionality. The in-built SSIS logging feature can help you to track the error messages and will help resolving the issue.
Click here to see an article that I have written on my website with respect to your question. I hope that might give you an idea about the functionality that already exist in SSIS and how you can make best use of it.
I think you can spend quality time on fine tuning your core business functionality than reinventing the process that already exists for the most part. The standard functionality might require some minor alterations based on our requirements but I don't think that you need to start from the ground up.
Hope that helps.
精彩评论