Would C# benefit from aggregate structs/classes? [closed]
Foreword
tl;wr: This is a discussion.
I am aware that this "question" is more of a discussion, hence why I will mark it as community wiki. However, according to the How to Ask page, it could belong here, as it is specifically programming related, not discussed anywhere on the web after an hour of research, specific, relevant to most C# programmers, and on-topic. Moreover, the question itself is meant to obtain an answer, for which I'd stay open-minded regardless of my bias: would C# really benefit from aggregate structs? Notwithstanding this foreword, I'd understand this to be closed, but would appreciate if the users with the authority and intention to close redirected me to an appropriate discussion spot on the Web.
Introduction
Lacks of struct mutability
Structs are flexible but debated types in C#. They offer the stack-allocated value type organizational paradigm, but not the immutability of other value types.
Some say structs should represent values, and values do not change (e.g. int i = 5;
, 5 is immutable), while some perceive them as OOP layouts with subfields.
The debate on struct immutability (1, 2, 3) , for which the current solution seems to be having the programmer enforce immutability, is also unsolved.
For instance, the C# compiler will detect possible data loss when structs are accessed as a reference (bottom of this page) and restrict assignment. Moreover, since struct constructors, properties and functions are able to d开发者_开发知识库o whichever operation, with the limit (for constructors) of assigning all the fields before returning controls, structs cannot be declared as constant, which would be a correct declaration if they were limited to data representation.
Immutable subset of structs, aggregates
Aggregate classes (Wikipedia) are strict data structures with limited functionality, destined to offer syntactic sugar in counterpart for their lack of flexibility. In C++, they have "no user-declared constructors, no private or protected non-static data members, no base classes, and no virtual functions". The theoretical specifics of such classes in C# are herein open for debate, although the core concept remains the same.
Since aggregate structs are strictly data holders with labeled accessors, their immutability (in a possible C# context) would be insured. Aggregates also couldn't be nulled, unless the null operator (?
) is specified, as for other pure value types. For this reason, many illegal struct operations would become possible, as well as some syntactic sugar.
Uses
- Aggregates could be declared as const, since their constructor would be enforced to do nothing but assign the fields.
- Aggregates could be used as default values for method parameters.
- Aggregates could be implicitly Sequential, facilitating interaction with native
- Aggregates would be immutable, enforcing no data loss for reference access. Compiler detection of such subfield modifications could lead to a complete, implicit reassignment.libraries.
Hypothetical Syntax
Taking from the C++ syntax, we could imagine something along the lines of: (Remember, this is a community wiki, improvement is welcome and encouraged)
aggregate Size
{
int Width;
int Height;
}
aggregate Vector
{
// Default values for constructor.
double X = 0, Y = 0, Z = 0;
}
aggregate Color
{
byte R, G, B, A = 255;
}
aggregate Bar
{
int X;
Qux Qux;
}
aggregate Qux
{
int X, Y;
}
static class Foo
{
// Constant is possible.
const Size Big = new Size(200, 100);
// Inline constructor.
const Vector Gravity = { 0, -9.8, 0 };
// Default value / labeled parameter.
const Color Fuschia = { 255, 0, 255 };
const Vector Up = { y: 1 };
// Sub-aggregate initialization
const Bar Test = { 20, { 4, 3 } };
static void SetVelocity(Vector velocity = { 0, 1, 0 }) { ... }
static void SetGravity(Vector gravity = Foo.Gravity) { ... }
static void Main()
{
Vector v = { 1, 2, 3 };
double y = v.Y; // Valid.
v.Y = 5; // Invalid, immutable.
}
}
Implicit (re)Assignment
As of today, assigning a subfield of a struct in C# 4.0 is valid:
Vector v = new Vector(1, 2, 3);
v.Z = 5; // Legal in current C#.
However, sometimes, the compiler can detect when structs are mistakenly accessed as references, and will forbid changing subfields. For example, (example question)
//(in a Windows.Forms context)
control.Size.Width = 20; // Illegal in current C#.
As Size
is a property and struct Size
a value type, we would be editing a copy/clone of the actual property, which would be useless in such a case. As C# users, we tend to assume most things are accessed by reference, especially in OOP designs, which would make us think that such a call is legitimate (and it would be, if struct Size
were a class
).
Moreover, when accessing collections, the compiler also forbids us from modifying a struct subfield: (example question)
List<Vector> vectors = ... // Imagine populated data.
vectors[4].Y = 10; // Illegal in current C#.
The good news about these unfortunate restrictions is that the compiler does half of the possible aggregate solution for such cases: detect when they occur. The other half would be to implicitly reassign a new aggregate with the changed value.
- When in local scope, simply reassign the vector.
- When in external scope, locate a get, and if a matching set accessor is accessible, reassign to this one.
For this to be done and in order to avoid confusion, the delegate must be marked as implicit:
implicit aggregate Vector { ... }
implicit aggregate Size { ... }
// Example 1
{
Vector v = new Vector(1, 2, 3);
v.Z = 5; // Legal with implicit aggregates.
// What is implicitly done:
v = new Vector(v.X, v.Y, 5); // Local variable, simply reassign.
}
// Example 2
{
//(in a Windows.Forms context)
control.Size.Width = 20; // Legal with implicit aggregates.
// What is implicitly done:
Size old = control.Size.__get(); // External, MSIL detects a get.
// If MSIL can find a matching, accessible __set:
control.Size.__set({ 20, old.Height });
}
// Example 3
{
List<Vector> vectors = ... // Imagine populated data.
vectors[4].Y = 10; // Legal with implicit aggregates.
// What is implicitly done:
Vector old = vectors[4].__get(); // External, MSIL detects a get.
// If MSIL can find a matching, accessible __set:
vectors[4].__set({ old.X, 10, old.Z });
}
// Example 4
{
Vector The5thVector(List<Vector> vectors) { return vectors[4]; }
...
List<Vector> vectors = ...;
The5thVector(vectors).Y = 10; // Illegal with implicit aggregates.
// This is illegal because the compiler cannot find an implicit
// "set" to match. as it is a function return, not a property or
// indexer.
}
Of course, this last implicit reassignment is only a syntactic simplification which could or could not be adopted. I simply propose it as the compiler seems to be able to detect such reference access to structs and could easily convert the code for the programmer if it was an aggregate.
Summary
- Aggregates can have fields;
- Aggregates are value types;
- Aggregates are immutable;
- Aggregates are allocated on the stack;
- Aggregates cannot inherit;
- Aggregates have a sequential layout;
- Aggregates have a sequential default constructor;
- Aggregates cannot have user defined constructors;
- Aggregates can have default values and labeled constructions;
- Aggregates can be defined inline;
- Aggregates can be declared as constant;
- Aggregates can be used as default parameters;
- Aggregates are non-nullable unless specified (
?
);
Possibly:
- Aggregates (could) be implicitly reassigned; See Marcelo Cantos' reply and comment.
- Aggregates (could) have interfaces;
- Aggregates (could) have methods;
Cons
As aggregates wouldn't replace structs but rather be another organizational scheme, I cannot find many cons, but hope that the C# veterans of S/O will be able to populate this CW section. On a last note, please answer the question directly, as well as discussing it: would C# benefit for aggregate classes as described in this post? I am no C# expert in any way, but only an enthusiast of the C# language, and miss this feature which seems crucial to me. I'm seeking advice and comments from experienced programmers regarding this case. I am aware that there are numerous workarounds that exist and actively use them everyday, I simply think that they are too common to be ignored.
I wish that structs had been defined with something like your proposed semantics in the first place.
However, we're stuck with what we've got now and I think it is unlikely that we'll ever get a whole new "kind of type" into the CLR. Introducing a new kind of type means introducing it to every .NET language, not just C#, and that's a big change.
I think what is more likely -- and remember, when I talk about hypothetical language features for hypothetical, unannounced future products that don't exist and may never exist, I'm doing so for entertainment purposes only -- is that we'll find some way to make better immutability annotations and enforcements on both classes and structs. The compiler could do a better job of both enforcing immutability and making it easier to program in an immutable style, regardless of whether the type in question is a value type or a reference type. And the compiler or CLR could also potentially do a better job of optimizing code that works on multi core machines if it had more immutability guarantees known at compile time or jit time.
While you are noodling away at your proposal, an interesting question you might want to consider is: if aggregate types have methods, is "this" a value or a variable? For example:
aggregate Vector
{
int x, y, z;
public void M(Action action)
{
Console.WriteLine(this.x);
action();
Console.WriteLine(this.x);
}
}
...
Vector v = new Vector(1, 2, 3);
Action action = ()=>{ v = new Vector(4, 5, 6); };
v.M(action);
What happens? Does "this" get passed to M by value, in which case it writes out "1" twice, or does it get passed as a reference to the variable, in which case your so-called "immutable" type is observed to mutate? (Because what is mutating is the variable; by definition variables are allowed to mutate, that's why they're called "vary-able".)
What would this do?
List<Vector> vectors = ...;
Vector v = vectors[4];
v.Y = 10;
or this?
Vector The5thVector(List<Vector> vectors) { return vectors[4]; }
...
List<Vector> vectors = ...;
The5thVector(vectors).Y = 10;
Replacement of diagnostics with implicit assignment won't get you very far. There's a reason mutable structs are so problematic, and simply declaring a new concept, aggregates, won't fix any of these problems.
The best solution would have been to disallow mutable structs in the language in the first place. The second best solution is to behave as if they were disallowed. Structs are supposed to be small and self-contained, which eliminates any disadvantages to making them immutable.
No, it would not benefit. Structs are better as mutable types anyway.
First of all... "Immutability with implicit reassignment" is really just "inefficient mutability".
Given a "Point" structure, if you intend to change only the value of X, why force a rewrite of the entire memory structure? Just overwriting X alone is more efficient than overwriting X with a new value and pointlessly overwriting Y with its current value. There would be no benefit to such a scheme.
Honestly, the whole topic of mutability is a matter of perspective. It really only makes sense to talk about mutability when referring to a complex object as a whole, and asking whether its individual pieces change value while maintaining references to the object as a whole.
For example, it makes sense to call a string immutable, because you refer to it as particular block of memory representing a collection of characters, in which the characters don't change value from the perspective of anything that has a reference to it. An int struct, on the other hand, is mutable, because it's value can be changed by a simple assignment, and any references (pointers) to the int struct will see those changes.
As for "this" in struct or aggregate methods, of course it should refer to the struct/aggregate's memory location on stack at all times, so updates via anonymous methods and delegates that change the struct's value, should be reflected and seen as mutable. To summarize, mutability is a good idea at a fundamental variable level, and immutability is best handled at a higher level where complex objects are represented and the "immutable" behavior is explicitly coded.
精彩评论