How to merge 2 csv files by columns in spoon, pentaho, while managing data conversion?
I'm facing the following problem:
I ha开发者_JAVA百科ve two inputs:
1) I have a csv base file with 35 columns and their proper headers. 2) I have a variety of given files, not controlled by me, that may or may not contain the 35 columns and, even worse, they may be out of order.I have to match the columns from the second csv file to the columns in the first csv file. If the second csv file doesn't have all the 35 columns, I should create those in their proper order.
Once I have a proper csv file (one which the header looks like the first csv header) I would pass it to a script that manages the data referencing them by the columns headers.
One possible solution would be getting the existing field inputs inside the script, however, I'm not able to do that because the fields seem to be fixed referencing the existing column headers of the second csv file. Therefore, when I try to access a column that doesn't exist, I end up with an exception...
Any help would be greatly appreciated!
Term "fields in second csv out of order" could have several meaning
- Same source of csv file but order of fields different from time to time
- Position of field(column number) in csv file is different in files provided by different source.
First case really weird. Same source should provide same data, and if this is not true, then logic to make decision can be really complicated.
Second case looks more real. In this case u can make all sources to 35 fields wide. Then u need to identify fields. There are plenty tools available in kettle to detect data type, string manipulation, regexp and so on.
Actually it sounds like u need to auto-detection of fields.
But without real data, hard to see pattern. Since u implement such logic of field detection on database level then it is possible in kettle as well.
Anyway If logic really complex then use JavaStep, JavaScript.
精彩评论