Most efficient way of storing lots of string messages in SQL Server?
My app receiving approximately 2000 string messages per second, each message is about 300 characters long.
I need to store all messages in a DB. I'm using SQL Express 2008, and .NET.
I开发者_开发百科'm thinking of holding all data in memory till it reaches a certain limit (10000 messages = 5 seconds , for example), and then write it down all at once.
This way the data will be written to the hard drive every 5 seconds, instead of every second.
Does my approach is good enough? What approach should I use in order to achieve the following results?
- messages are not piling up in memory
- Hard drive won't commit suicide :)
Note: there is no need to parse the strings, the only thing is to store them by the order they arrived.
If you describe more thoroughly what you want to do with these massive amounts of data after you have stored them, it will be easier to make a clear suggestion on what to do with it.
At the face of it, it sounds like too much data for a relational database to handle. I would rather devise a solution based on plain text files if all you want is storage. If you want to be able to search the text files, you can slowly index them with a service or console application behind the scenes.
The index can be built with Lucene.NET and what you index can be kept to a minimum since I hope you don't need to be able to search for absolutely everything you store in these text files.
A quick calculation indicates that you may experience up to 50 GB of data per day. If there is no SQL specific processing to be done on this data then it doesn't seem feasible to store it in a database.
The next solution would be files on the disk and since you deal with simple text (not binary) then perhaps a quick compression would also help. However since the files would be so small (300 bytes), compression would not yield any sensible results. The data would need to be grouped in larger files, for instance one piece of data per line and one such file per day. This file would be sufficiently large so that compression would give satisfactory results if the disk space would become an issue.
If the space is not an issue and/or frequent processing of this data or even simultaneous processing of data from different days is to be expected then one piece of data per file would be a better choice. This solution, in turn, will bring the issue of having a very large number of files inside a folder which will not only bump against file system limitations but also create performance issues when working with these files, and these issues will affect the entire machine performance.
Storing and accessing a large number of files in a better manner is to use a partitioned folder storage. That is each file would have to have a unique name and will then be placed in a specific folder hierarchy according to its name. This approach has several advantages:
- keeps the number of files per folder manageable (when this number increases, one only needs to go one folder hierarchy deeper to increase the "storage availability" exponentially)
- easy to find a file's location or where to store a file based on the naming convention
Sample partitioning:
- file names follow this format:
yyyymmddhhss-<counter>.txt
(e.g.:201104252345-1.txt
,201104252345-2.txt
, etc) - folder structure follows the time parts:
\yyyy\mm\dd\
oryyyy\mm\dd\hh\
etc (as many levels as the solution would need to keep the number of files manageable) - results in:
201104252345-1.txt
being stored as2011\04\25\201104252345-1.txt
, etc
I won't do that in your situation. Assuming:
(2000 * 300) / 1024(kb) / 1024(mb) = about 0.54 MB per second.
One day have: 60(sec) * 60(min) * 24(hour) = 86400 seconds.
0.54 * 86400 = 43200 MB per day.
If you will use UTF-8 encoding the size will be twice bigger! (varchar versus nvarchar)
It mean you will get about 40 GB per day groth. Your express server wont survive even if you write insert query each 5 sec even 10 or 20 sec. Consider index rebuilding for good query performance, backuping the database in particular time period and other database stuff you have to carry about. Your database won't handle requests.
I would recommend you storing strings in text files(if your text will be rarely read by end user otherwise I recommend using some index engine (Lucene maybe)) and cache them in application server. Store only path of those files in database.
Note. It's only my own solution based on some facts and expirience.
EDIT
Using application you will get more control with your data. You can send files via HTTP to other server you can compress files etc.
精彩评论