.NET ETL Process
First some background; we are developing a datawarehouse and doing some research on what tools to use for our ETL process. The team is very developer centric, everyone is knowledgeable with C#. So far I have looked at RhinoETL, Pentaho (Kettle), Astrix Centerprise. SSIS is out for a number of reasons which are outside the scope of this question.
At this time, I am leaning towards something more developer oriented like RhinoETL because it seems like the path of least resistance for a group of devs. Do the other more visual designer oriented p开发者_JAVA技巧roducts bring anything to the table that RhinoETL doesn't? Are there any specific things I should be paying attention to when evaluating these ETL tools? Are there any other tools that we should also investigate?
I know this is a late answer, but as I needed a proper Elt with all SSIS features but in a 100% .net environment, I came up developing my own.
- Github repo: https://github.com/paillave/Etl.Net
- Begining of documentation: https://paillave.github.io/Etl.Net
For sure, performances are not as good as SSIS. I believe that if you want massive performances for huge volumes to integrate and transform, you should still use SSIS.
The main thing that I really needed that no other kinda-etl tool like RhinoEtl provides, is a proper tracing system that permits to have traces of any single details that is easily manipulate to record if necessary. I made lot of out of the box adapters for file system, ftp, sftp, xml, csv, entityframework core and bulk load. I even came up with a visual tool to view the structure of the transformation process.
It took me 10 months so far, and I open sourced it. It still lacks a lot of documentation (huge work to achieve). I must complete it with a much bigger set of unit tests (also huge work to achieve) for me to decently release it in beta version. Even if I still left it in alpha version, it is the foundation of all ETL processes of my company, and it works like hell!
Recently my coworker and I did some simple performance testing between RhinoETL and SSIS. It seem that for simple data flows SSIS always outperformed RhinoETL (moves 2,000,000 records about 30% faster). If you are using source control (in our case TFS), you can not easily view differences between versions of dtsx files (SSIS files), where developing with RhinoETL allows you to utilize TFS features.
Another advantage RhinoETL has is seen if you develop a User Interface on top of your data warehouse. You can share code between these two programs.
Although several of the members of our SSIS team come from .Net backgrounds, our management decided to continue developing with SSIS (although they upgraded to SSIS 2008 --another topic altogether) because they felt it was easier to have a developer learn SSIS than .Net.
精彩评论