Reducing memory of similar objects
I'm looking at reducing the memory consumption of a table like collection object.
Given a class structure like
Class Cell
{
public property int Data;
public property string Format;
}
Class Table
{
public property Dictionary<Position, Cell> Cells;
}
When there are a large number of cells the Data property of the Cell class may be var开发者_Go百科iable but the Format property may be repeated many times, e.g. the header cells may have an empty format string for titles and the data cells may all be "0.00".
One idea is to something like the following
Class Cell
{
public property int Data;
public property int FormatId;
}
Class Table
{
public property Dictionary<Position, Cell> Cells;
private property Dictionary<Position, string> Formats;
public string GetCellFormat(Position);
}
This would save memory on strings however the FormatId integer value would still be repeated many times.
Is there a better implementation than this? I've looked at the flyweight pattern but am unsure if it matches this.
A more complex implementation I am considering is removing the Format property from the Cell class altogether and instead storing the Formats in a dictionary that groups adjacent cells together
e.g. there may be 2 entries like this<item rowFrom=1 rowTo=1 format="" /
>
<item romFrom=2 rowTo=1000 format="0.00" /
>For strings, you could perhaps look at interning; either with the inbuilt interner, or (preferably) a custom interner - basically a Dictionary<string,string>
. What this means is that each identical string uses the same reference - and the duplicates can be collected.
Don't do anything with the int; that is already optimal.
For example:
using System;
using System.Collections.Generic;
class StringInterner {
private readonly Dictionary<string, string> lookup
= new Dictionary<string, string>();
public string this[string value] {
get {
if(value == null) return null;
if(value == "") return string.Empty;
string result;
lock (lookup) { // remove if not needed to be thread-safe
if (!lookup.TryGetValue(value, out result)) {
lookup.Add(value, value);
result = value;
}
}
return result;
}
}
public void Clear() {
lock (lookup) { lookup.Clear(); }
}
}
static class Program {
static void Main() {
// this line is to defeat the inbuilt compiler interner
char[] test = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' };
string a = new string(test), b = new string(test);
Console.WriteLine(ReferenceEquals(a, b)); // false
StringInterner cache = new StringInterner();
string c = cache[a], d = cache[b];
Console.WriteLine(ReferenceEquals(c, d)); // true
}
}
You could take this further with WeakReference
if desired.
Note importantly that you don't need to change your design - you just change the code that populates the object to use the interner/cache.
Have you actually determined whether or not this is actually a problem? The CLR does a lot of string interning on your behalf so it is possible (depending on CLR version and how your code was compiled) that you are not using as much memory as you think you are.
I would highly recommend that you validate your suspicions about memory utilization before you change your design.
As others have mentioned you first want to see if this is a problem before changing your design etc. If it is a problem and / or you dealing with a lot of sparse data then a sparse data structure might be more applicable for the problem. I will post a very simple naive implementation later (as I am unable to at the moment) but a two dimensional sparse matrix would do what you desire.
The concept is that a single format (either string or class) will be stored for a given cell range e.g. 1-1000. To benefit from this you have to make some design changes... The format property will need to be removed from the cell class. Instead the format should be registered either in the table class or preferably with another class. For example
public class CellFormats
{ ....
public void Register(int start, int finish, string format);
}
The cell formats class will contain the sparse matrix which will contain a format for a range.
The Table class will then utilise the CellFormats class. Instead of having a method called GetCellFormat it would have a method with the following signature
void ApplyCellFormat(Position cellPosition)
This would retrieve the cell format from the CellFormats class (sparse matrix) and apply it to the cell.
This technique can dramatically reduce memory usage if the data is common and applied to large ranges. But as I have already said, you need to ensure this is cause of the problem before making your design more complex by adding code like this.
精彩评论