Java Desktop Application to Upload CSV Files to Database
I'm working on desktop application (Extract Transform Load type of app) that requires uploading of large CSV files (about 6 GB) to a database using Java as front end and Oracle as a backend but I'm open to other databases and tools as well (i.e. SQLite, H2, HSQLDB, MySQL).
I tried using OpenSource Java libraries for parsing and uploading CSV files such as:
- OpenCSV
- Super CSV
The problem with these libraries is efficiency. They require too much time for uploading. For example, It'll take 6 hours to upload 4 Gb of CSV file.
Having this experience, I decided to try database utilities for uploading CSV files. One of them is Oracle SQL*Loader. It's faster and offers desirable result. It can remove whitespaces, load CSV files to various tables, use oracle functions suchs as decode, etc.
However, I have the following problems:
I couldn't find a way to get SQL*Loader's errors/logs in Java and display them to the user.
Since, SQL*Loader is proprietary, there's no way to embed it to my application installation. For example, the user needs to download and install Oracle Client separately before installing my application. Another thing is I need to check if Oracle Client is installed b开发者_开发问答efore installation process begins.
I looked for other probable solutions and found headless databases like SQLite and H2.
On SQLite, I use the .import command to upload CSV files. But there's no way I could call it in Java.
I turn to H2 database but I have hard time uploading since it does not support escaping the header row.
Please let me know your suggestions.
Thanks you.
If you have a bash shell available you could do this:
sqlite3 my_database_file <<< .import <(gawk 'NR>1 { a = $0; if(NR>2){print a;}}' my_csv_file) my_table_name
It sounds like you could use CloverETL for this. It is a Java based data integration and ETL tool. Along with its commercial editions it has a free, opensource ETL engine. It should be quite easy to embed to your application as a Java library. You also should not have any performance issues you mention above, unless the bottleneck is on the DB side. CloverETL is said to be fast.
If you need to know other technical details, you can either ask me or use CloverETL forum.
I am assuming that you are using the SQLITE command line management utility.
The easiest way to do this is to remove the header and footer lines from the import file before importing it into sqlite, using whatever editing feature you are most comfortable with on your system.
If the sample file you have posted is accurate, you will have to do this anyway since the final line, which contains just one column, will be rejected by the SQLITE .import command, causing the entire input to be aborted.
If you fix the footer problem, and insist on using SQLITE commands, here is a suggested technique:
Fix the footer
HEADER1, HEADER2, HEADER3
row1,row2,row3
row11,row22,row33
row111,row222,row33
FOOTER, FOOT1, FOOT2
Do the import
sqlite> select * from test;
HEADER1, HEADER2, HEADER3
row1,row2,row3
row11,row22,row33
row111,row222,row33
FOOTER, FOOT1, FOOT2
Remove the footer
sqlite> select count() from test;
5
sqlite> delete from test where rowid=5;
Remove the header
sqlite> delete from test where rowid=1;
sqlite> select * from test;
row1,row2,row3
row11,row22,row33
row111,row222,row33
精彩评论