开发者

Structs versus classes

I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic开发者_StackOverflow list and when they are, I'll loop them and check value a and maybe update value b.

Is it faster/better to create these objects as class or as struct?

EDIT

a. The properties are value types (except the string i think?)

b. They might (we're not sure yet) have a validate method

EDIT 2

I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?


Is it faster to create these objects as class or as struct?

You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.

Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.

Is it better to create these objects as class or as struct?

Maybe class, maybe struct. As a rule of thumb: If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.

If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.

Are objects on the heap and the stack processed equally by the garbage collector?

No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)

But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.

Is that clear?


Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.

Example:

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i].id = i;
    list[i].isValid = true;
}

is about 2 to 3 times faster than

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i] = new Value(i, true);
}

where Value is a struct with two fields (id and isValid).

struct Value
{
    int id;
    bool isValid;

    public Value(int i, bool isValid)
    {
        this.i = i;
        this.isValid = isValid;
    }
}

On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.


Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.

In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.

(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)


Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.

When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.

More here:

http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx


If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.

As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.

Edit:

I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.

There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.

Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.


The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.


A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.

I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.

For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.

A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:

class ExposedHolder<T>
{
  public T Value;
  ExposedHolder() { }
  ExposedHolder(T val) { Value = T; }
}

If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.

One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.


Use classes.

On a general note. Why not update value b as you create them?


From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.


Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.

That's re: performance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜