In C#, what is the smallest memory footprint way of storing database rows in memory for processing?
I am somewhat new to C# programming and need some advice on how to tackle a problem. I need to process tens of thousands records stored on a SQL Server database, and the processing should be as fast as possible.
To maximize the performance, I am fetching the rows from the database o开发者_如何学Cn a background thread when the application starts, because it need to wait for some user input before starting the processing. This approach saves 20% of time of the entire process, but it is very resource hungry in terms of memory footprint (the process is taking 200MB of RAM, and I estimate that the database rows has less than 10MB of raw data).
I'm using a class with members storing data from the database columns, and using ArrayList to store the rows.
Is there another approach to store the data in memory to minimize the consumed memory?
You should be aware of the fact, that the memory usage indicated in the taskmanager is not necessary the memory used by the data. The program grabs more memory than it needs at the moment, to be able to scale well. If you want to find out how much memory is exactly used, use a memory profiler.
What are the data types of the columns?
If there are a lot of strings then you may be suffering from the string overhead. .NET strings are UTF-16 (2 bytes per character) and (I think) have 16-18 bytes of overhead per string. If you really need to save memory, and the data is ASCII, you could consider combining several string columns into a byte array using Encoding.UTF8.
// Occupies 64 bytes of memory
string col1 = "Me", col2 = "You", col3 = "Us";
StringBuilder sb = new StringBuilder(col1);
// only works if you are sure the columns have no nulls
sb.Append('\0');
sb.Append(col2);
sb.Append('\0');
sb.Append(col3);
// Occupies 24 bytes of memory
byte[] array = Encoding.UTF8.GetBytes(sb.ToString());
Of course, this would slow down the program and you'd have to write code to unpack the byte array when you need to get the strings out, but you might save a lot of memory.
Some basic things to check without knowing more details about your app:
- Are you only storing stuff in memory that you need?
- Are you creating stuff on the Large Object Heap? That likely won't be collected.
- Can you process data in batches, and reduce the results of each batch into another intermediate memory/disk store? Essentially, can you use some form of map-reduce?
- Use WinDBG to look at your heap and see rooted objects. It'll give you a better idea of what's in the 200MB.
"I'm using a class with members" could be Your problem. The primitive data types like bool, int etc should roughly require the same space as they do in Your DB. But when You create a new instance of a class, additional data has to be reserved on the heap. Now this shouldn't account for 200MB when only processing "tens of thousands" of rows, but You could try to use a value type instead (eg change Your class to a struct).
Also, if Your DB contains strings of roughly the same length each, You could use char-arrays to store them in order to "minimize the consumed memory".
精彩评论