Why do we need boxing and unboxing in C#?
Why do we need boxing and unboxing in C#?
I know what boxing and unboxing is, but I can't comprehend the real use of it. Why and where should I use it?
short 开发者_StackOverflow社区s = 25;
object objshort = s; //Boxing
short anothershort = (short)objshort; //Unboxing
Why
To have a unified type system and allow value types to have a completely different representation of their underlying data from the way that reference types represent their underlying data (e.g., an int
is just a bucket of thirty-two bits which is completely different than a reference type).
Think of it like this. You have a variable o
of type object
. And now you have an int
and you want to put it into o
. o
is a reference to something somewhere, and the int
is emphatically not a reference to something somewhere (after all, it's just a number). So, what you do is this: you make a new object
that can store the int
and then you assign a reference to that object to o
. We call this process "boxing."
So, if you don't care about having a unified type system (i.e., reference types and value types have very different representations and you don't want a common way to "represent" the two) then you don't need boxing. If you don't care about having int
represent their underlying value (i.e., instead have int
be reference types too and just store a reference to their underlying value) then you don't need boxing.
where should I use it.
For example, the old collection type ArrayList
only eats object
s. That is, it only stores references to somethings that live somewhere. Without boxing you cannot put an int
into such a collection. But with boxing, you can.
Now, in the days of generics you don't really need this and can generally go merrily along without thinking about the issue. But there are a few caveats to be aware of:
This is correct:
double e = 2.718281828459045;
int ee = (int)e;
This is not:
double e = 2.718281828459045;
object o = e; // box
int ee = (int)o; // runtime exception
Instead you must do this:
double e = 2.718281828459045;
object o = e; // box
int ee = (int)(double)o;
First we have to explicitly unbox the double
((double)o
) and then cast that to an int
.
What is the result of the following:
double e = 2.718281828459045;
double d = e;
object o1 = d;
object o2 = e;
Console.WriteLine(d == e);
Console.WriteLine(o1 == o2);
Think about it for a second before going on to the next sentence.
If you said True
and False
great! Wait, what? That's because ==
on reference types uses reference-equality which checks if the references are equal, not if the underlying values are equal. This is a dangerously easy mistake to make. Perhaps even more subtle
double e = 2.718281828459045;
object o1 = e;
object o2 = e;
Console.WriteLine(o1 == o2);
will also print False
!
Better to say:
Console.WriteLine(o1.Equals(o2));
which will then, thankfully, print True
.
One last subtlety:
[struct|class] Point {
public int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Point p = new Point(1, 1);
object o = p;
p.x = 2;
Console.WriteLine(((Point)o).x);
What is the output? It depends! If Point
is a struct
then the output is 1
but if Point
is a class
then the output is 2
! A boxing conversion makes a copy of the value being boxed explaining the difference in behavior.
In the .NET framework, there are two species of types--value types and reference types. This is relatively common in OO languages.
One of the important features of object oriented languages is the ability to handle instances in a type-agnostic manner. This is referred to as polymorphism. Since we want to take advantage of polymorphism, but we have two different species of types, there has to be some way to bring them together so we can handle one or the other the same way.
Now, back in the olden days (1.0 of Microsoft.NET), there weren't this newfangled generics hullabaloo. You couldn't write a method that had a single argument that could service a value type and a reference type. That's a violation of polymorphism. So boxing was adopted as a means to coerce a value type into an object.
If this wasn't possible, the framework would be littered with methods and classes whose only purpose was to accept the other species of type. Not only that, but since value types don't truly share a common type ancestor, you'd have to have a different method overload for each value type (bit, byte, int16, int32, etc etc etc).
Boxing prevented this from happening. And that's why the British celebrate Boxing Day.
The best way to understand this is to look at lower-level programming languages C# builds on.
In the lowest-level languages like C, all variables go one place: The Stack. Each time you declare a variable it goes on the Stack. They can only be primitive values, like a bool, a byte, a 32-bit int, a 32-bit uint, etc. The Stack is both simple and fast. As variables are added they just go one on top of another, so the first you declare sits at say, 0x00, the next at 0x01, the next at 0x02 in RAM, etc. In addition, variables are often pre-addressed at compile-time, so their address is known before you even run the program.
In the next level up, like C++, a second memory structure called the Heap is introduced. You still mostly live in the Stack, but special ints called Pointers can be added to the Stack, that store the memory address for the first byte of an Object, and that Object lives in the Heap. The Heap is kind of a mess and somewhat expensive to maintain, because unlike Stack variables they don't pile linearly up and then down as a program executes. They can come and go in no particular sequence, and they can grow and shrink.
Dealing with pointers is hard. They're the cause of memory leaks, buffer overruns, and frustration. C# to the rescue.
At a higher level, C#, you don't need to think about pointers - the .Net framework (written in C++) thinks about these for you and presents them to you as References to Objects, and for performance, lets you store simpler values like bools, bytes and ints as Value Types. Underneath the hood, Objects and stuff that instantiates a Class go on the expensive, Memory-Managed Heap, while Value Types go in that same Stack you had in low-level C - super-fast.
For the sake of keeping the interaction between these 2 fundamentally different concepts of memory (and strategies for storage) simple from a coder's perspective, Value Types can be Boxed at any time. Boxing causes the value to be copied from the Stack, put in an Object, and placed on the Heap - more expensive, but, fluid interaction with the Reference world. As other answers point out, this will occur when you for example say:
bool b = false; // Cheap, on Stack
object o = b; // Legal, easy to code, but complex - Boxing!
bool b2 = (bool)o; // Unboxing!
A strong illustration of the advantage of Boxing is a check for null:
if (b == null) // Will not compile - bools can't be null
if (o == null) // Will compile and always return false
Our object o is technically an address in the Stack that points to a copy of our bool b, which has been copied to the Heap. We can check o for null because the bool's been Boxed and put there.
In general you should avoid Boxing unless you need it, for example to pass an int/bool/whatever as an object to an argument. There are some basic structures in .Net that still demand passing Value Types as object (and so require Boxing), but for the most part you should never need to Box.
A non-exhaustive list of historical C# structures that require Boxing, that you should avoid:
The Event system turns out to have a Race Condition in naive use of it, and it doesn't support async. Add in the Boxing problem and it should probably be avoided. (You could replace it for example with an async event system that uses Generics.)
The old Threading and Timer models forced a Box on their parameters but have been replaced by async/await which are far cleaner and more efficient.
The .Net 1.1 Collections relied entirely on Boxing, because they came before Generics. These are still kicking around in System.Collections. In any new code you should be using the Collections from System.Collections.Generic, which in addition to avoiding Boxing also provide you with stronger type-safety.
You should avoid declaring or passing your Value Types as objects, unless you have to deal with the above historical problems that force Boxing, and you want to avoid the performance hit of Boxing it later when you know it's going to be Boxed anyway.
Per Mikael's suggestion below:
Do This
using System.Collections.Generic;
var employeeCount = 5;
var list = new List<int>(10);
Not This
using System.Collections;
Int32 employeeCount = 5;
var list = new ArrayList(10);
Update
This answer originally suggested Int32, Bool etc cause boxing, when in fact they are simple aliases for Value Types. That is, .Net has types like Bool, Int32, String, and C# aliases them to bool, int, string, without any functional difference.
Boxing isn't really something that you use - it is something the runtime uses so that you can handle reference and value types in the same way when necessary. For example, if you used an ArrayList to hold a list of integers, the integers got boxed to fit in the object-type slots in the ArrayList.
Using generic collections now, this pretty much goes away. If you create a List<int>
, there is no boxing done - the List<int>
can hold the integers directly.
Boxing and Unboxing are specifically used to treat value-type objects as reference-type; moving their actual value to the managed heap and accessing their value by reference.
Without boxing and unboxing you could never pass value-types by reference; and that means you could not pass value-types as instances of Object.
The last place I had to unbox something was when writing some code that retrieved some data from a database (I wasn't using LINQ to SQL, just plain old ADO.NET):
int myIntValue = (int)reader["MyIntValue"];
Basically, if you're working with older APIs before generics, you'll encounter boxing. Other than that, it isn't that common.
Boxing is required, when we have a function that needs object as a parameter, but we have different value types that need to be passed, in that case we need to first convert value types to object data types before passing it to the function.
I don't think that is true, try this instead:
class Program
{
static void Main(string[] args)
{
int x = 4;
test(x);
}
static void test(object o)
{
Console.WriteLine(o.ToString());
}
}
That runs just fine, I didn't use boxing/unboxing. (Unless the compiler does that behind the scenes?)
In .net, every instance of Object, or any type derived therefrom, includes a data structure which contains information about its type. "Real" value types in .net do not contain any such information. To allow data in value types to be manipulated by routines that expect to receive types derived from object, the system automatically defines for each value type a corresponding class type with the same members and fields. Boxing creates a new instances of this class type, copying the fields from a value type instance. Unboxing copies the fields from an instance of the class type to an instance of the value type. All of the class types which are created from value types are derived from the ironically named class ValueType (which, despite its name, is actually a reference type).
When a method only takes a reference type as a parameter (say a generic method constrained to be a class via the new
constraint), you will not be able to pass a reference type to it and have to box it.
This is also true for any methods that take object
as a parameter - this will have to be a reference type.
In general, you typically will want to avoid boxing your value types.
However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.
There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.
Boxing is the conversion of a value to a reference type with the data at some offset in an object on the heap.
As for what boxing actually does. Here are some examples
Mono C++
void* mono_object_unbox (MonoObject *obj)
{
MONO_EXTERNAL_ONLY_GC_UNSAFE (void*, mono_object_unbox_internal (obj));
}
#define MONO_EXTERNAL_ONLY_GC_UNSAFE(t, expr) \
t result; \
MONO_ENTER_GC_UNSAFE; \
result = expr; \
MONO_EXIT_GC_UNSAFE; \
return result;
static inline gpointer
mono_object_unbox_internal (MonoObject *obj)
{
/* add assert for valuetypes? */
g_assert (m_class_is_valuetype (mono_object_class (obj)));
return mono_object_get_data (obj);
}
static inline gpointer
mono_object_get_data (MonoObject *o)
{
return (guint8*)o + MONO_ABI_SIZEOF (MonoObject);
}
#define MONO_ABI_SIZEOF(type) (MONO_STRUCT_SIZE (type))
#define MONO_STRUCT_SIZE(struct) MONO_SIZEOF_ ## struct
#define MONO_SIZEOF_MonoObject (2 * MONO_SIZEOF_gpointer)
typedef struct {
MonoVTable *vtable;
MonoThreadsSync *synchronisation;
} MonoObject;
Unboxing an object in Mono is a process of casting a pointer at an offset of 2 gpointers in the object (e.g. 16 bytes). A gpointer
is a void*
. This makes sense when looking at the definition of MonoObject
as it's clearly just a header for the data.
C++
To box a value in C++ you could do something like:
#include <iostream>
#define Object void*
template<class T> Object box(T j){
return new T(j);
}
template<class T> T unbox(Object j){
T temp = *(T*)j;
delete j;
return temp;
}
int main() {
int j=2;
Object o = box(j);
int k = unbox<int>(o);
std::cout << k;
}
Boxing happens when a value type is passed to a variable or parameter with a type of object
. Since it happens automatically, the question is not when you should use boxing, but rather when you should use the type object
.
The type object
should only be used when it is absolutely necessary, since it circumvents the type safety which is otherwise a major benefit of a statically typed language like C#. But it may be necessary in cases where it is not possible to know the type of a value at compile time.
For example when reading a database field value through the ADO.NET framework. The returned value could be either an integer or a string or something else, so the type has to be object
, and the client code has to perform the appropriate casting. To avoid this problem, ORM frameworks like Linq-to-SQL or EF Core use statically typed entities instead, so the use of object
is avoided.
Before the introduction of generics, collections like ArrayList
had the items types as object
. This meant you could store anything in a list, and you could add a string to a list of numbers, without the type system complaining. Generics solve this problem and make boxing unnecessary when using collections of value types.
So typing something as object
is rarely needed, and you want to avoid it. Generics is typically a better solution in cases where code needs to be able to handle both value types and reference types.
精彩评论