开发者

Choosing a clever solution: SQL Server or file processing for bulk data?

We have a number of files generated from a test, with each file having almost 60,000 lines of data. Requirement is开发者_JAVA技巧 to calculate number of parameters with the help of data present in these files. There could be two ways of processing the data :

  1. Each file is read line-by-line and processed to obtain required parameters
  2. The file data is bulk copied into the database tables and required parameters are calculated with the help of aggregate functions in the stored procedure.

I was trying to figure out the overheads related to both the methods. As a database is meant to handle such situations, I am concerned with overheads which may be a problem when database grows larger.

Will it affect the retrieval rate from the tables, consequently making the calculations slower? Thus will file processing be a better solution taking into account the database size? Should database partitioning solve the problem for large database?


Did you consider using map-reduce (say under Hadoop maybe with HBase) to perform these tasks? If you're looking for high-throughput with big data volumes this is a very scaleable approach. Of course, not every problem can be addressed effectively using this paradigm and I don't know the details of your calculation.


If you set up indexes correctly you won't suffer performance issues. Additionally, there is nothing stopping you loading the files into a table and running the calculations and then moving the data into an archive table or deleting it altogether.


you can run a querty directly agianst the text file from SQL

SELECT * FROM   OPENROWSET('MSDASQL',  
   'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\;', 
   'SELECT * FROM [text.txt];')
  • The distributed queries needs to be enabled to run this.

Or as you mentioned you can load the data data to a table (using SSIS, BCP, the query above ..). You did not mentioned what does it mean that the database will be larger. 60k of lines for a table is not so much (meaning that it will perform well).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜