Choosing a clever solution: SQL Server or file processing for bulk data?
We have a number of files generated from a test, with each file having almost 60,000 lines of data. Requirement is开发者_JAVA技巧 to calculate number of parameters with the help of data present in these files. There could be two ways of processing the data :
- Each file is read line-by-line and processed to obtain required parameters
- The file data is bulk copied into the database tables and required parameters are calculated with the help of aggregate functions in the stored procedure.
I was trying to figure out the overheads related to both the methods. As a database is meant to handle such situations, I am concerned with overheads which may be a problem when database grows larger.
Will it affect the retrieval rate from the tables, consequently making the calculations slower? Thus will file processing be a better solution taking into account the database size? Should database partitioning solve the problem for large database?
Did you consider using map-reduce (say under Hadoop maybe with HBase) to perform these tasks? If you're looking for high-throughput with big data volumes this is a very scaleable approach. Of course, not every problem can be addressed effectively using this paradigm and I don't know the details of your calculation.
If you set up indexes correctly you won't suffer performance issues. Additionally, there is nothing stopping you loading the files into a table and running the calculations and then moving the data into an archive table or deleting it altogether.
you can run a querty directly agianst the text file from SQL
SELECT * FROM OPENROWSET('MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\;',
'SELECT * FROM [text.txt];')
- The distributed queries needs to be enabled to run this.
Or as you mentioned you can load the data data to a table (using SSIS, BCP, the query above ..). You did not mentioned what does it mean that the database will be larger. 60k of lines for a table is not so much (meaning that it will perform well).
精彩评论