开发者

.NET should I store references or values?

I need to store in my object which values hav开发者_如何转开发e already been handlerd, I am doubting what would cost more performance, should I create an array that stores:

  • The instance references (they are not structs, only ref classes)
  • The hashcode of the items
  • The name of the name of the properties (string) that have been handled

Update

my aim is that the collection of the data on the handled references should cost the less of memory the possible since I am going to have a tone of the parent instance type.

I care less about the retrieval time (i.e. collection.Contains(reference)).

So my question is then what of the above array what cost less memory.


Storing references to the object seems like the easiest and lowest memory cost option.

If you're using this for a "has this been handled" check, the best option (for fastest checking) is probably to implement Object.Equals and Object.GetHashCode on your class, and then use a HashSet<T>. HashSet<T> is nice for this because it provides an O(1) Contains() method.

If you can't change the class to allow for hashing, you can alternatively implement IEqualityComparer for the object.


.NET style hashcodes aren't an option unless the possible range of different values for your objects is less than 2^32, as otherwise you will get false-positives (and given the birthday paradox, this can happen more often than you might think even with a great hash function). Hashcodes give a quick link to a bucket of zero-or-more items, which are then checked for equality. Hence a hashcode-base solution would require you to store a reference to each object as well anyway, and hence can't be smaller in memory to storing just the references.

If the objects couldn't be garbage collected (i.e. they are still "alive" to another part of the application) then the cost of storing the reference will be 4 or 8 bytes depending on architecture. If they could be GC'd then the cost is dependant on the size of the graph of that object.

Now, if you can create your own lossless hash object of the objects that is smaller than that, you can get a memory saving. E.g.:

public class ObjectOfInterest
{// all fields public for sake of simplicity in example
    public int ID; // this is important diff id - diff object.
    public int ParID; // this is unimportant, as same for all objects processed here.
    public ParentType Parent; // this is just memoised based on _parID;
    public decimal Val; // this is important.
    public string Name; // unimportant for our purposes.
    public RelatedType Stuff; // memoised based on _id
}

Then we can produce a related:

public struct HashObject
{
    private readonly int _id;
    private readonly decimal _val;
    public HashObject(ObjectOfInterest ooi)
    {
        _id = ooi.ID;
        _val = ooi.Val;
    }
    public bool Matches(ObjectOfInterest ooi)
    {
        return _id == ooi.ID && _val == ooi.Val;
    }
    // because one of the options as to how to store *this* is hashing
    public bool Equals(HashObject ho)
    {
        return _id == ho._id && _val == ooi._val;
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as HashObject);
    }
    public int GetHashCode()
    {
        unchecked
        {
            return _val.GetHashCode() ^ (_id << 16) ^ (_id >> 16);
        }
    }
}

Now, we store the HashObjects and use these to note what we've done with. In this case we're going to take up at least 20 bytes storing this struct, plus overhead of whatever means we have to store it. Smaller if the ObjectOfInterest can now be GC'd, pointless if they are still in memory anyway.

There's a hash and equality approach (knowledge of likely values can improve how good the hash is) if you decided to store these in a HashSet themselves. HashSet won't be the most memory efficient collection, though it might be that given the extra strain you are putting on this in all of these comparisons, that you do need the faster look up. This is area for experimentation over theory (esp. since the details change according to your objects). If you can take the look-up time complexity of constantly scanning arrays, then that's your best bet.

If there isn't a possible smaller object than you original type that allows for a full relevantly-equal comparison, then this approach can't work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜