Why .NET String is immutable? [duplicate]
As we all know, String is immutable. What are the reasons for String being immutable and the introduction of StringBuilder class as mutable?
- Instances of immutable types are inherently thread-safe, since no thread can modify it, the risk of a thread modifying it in a way that interferes with another is removed (the reference itself is a different matter).
- Similarly, the fact that aliasing can't produce changes (if x and y both refer to the same object a change to x entails a change to y) allows for considerable compiler optimisations.
- Memory-saving optimisations are also possible. Interning and atomising being the most obvious examples, though we can do other versions of the same principle. I once produced a memory saving of about half a GB by comparing immutable objects and replacing references to duplicates so that they all pointed to the same instance (time-consuming, but a minute's extra start-up to save a massive amount of memory was a performance win in the case in question). With mutable objects that can't be done.
- No side-effects can come from passing an immutable type as a method to a parameter unless it is
out
orref
(since that changes the reference, not the object). A programmer therefore knows that ifstring x = "abc"
at the start of a method, and that doesn't change in the body of the method, thenx == "abc"
at the end of the method. - Conceptually, the semantics are more like value types; in particular equality is based on state rather than identity. This means that
"abc" == "ab" + "c"
. While this doesn't require immutability, the fact that a reference to such a string will always equal "abc" throughout its lifetime (which does require immutability) makes uses as keys where maintaining equality to previous values is vital, much easier to ensure correctness of (strings are indeed commonly used as keys). - Conceptually, it can make more sense to be immutable. If we add a month onto Christmas, we haven't changed Christmas, we have produced a new date in late January. It makes sense therefore that
Christmas.AddMonths(1)
produces a newDateTime
rather than changing a mutable one. (Another example, if I as a mutable object change my name, what has changed is which name I am using, "Jon" remains immutable and other Jons will be unaffected. - Copying is fast and simple, to create a clone just
return this
. Since the copy can't be changed anyway, pretending something is its own copy is safe. - [Edit, I'd forgotten this one]. Internal state can be safely shared between objects. For example, if you were implementing list which was backed by an array, a start index and a count, then the most expensive part of creating a sub-range would be copying the objects. However, if it was immutable then the sub-range object could reference the same array, with only the start index and count having to change, with a very considerable change to construction time.
In all, for objects which don't have undergoing change as part of their purpose, there can be many advantages in being immutable. The main disadvantage is in requiring extra constructions, though even here it's often overstated (remember, you have to do several appends before StringBuilder becomes more efficient than the equivalent series of concatenations, with their inherent construction).
It would be a disadvantage if mutability was part of the purpose of an object (who'd want to be modeled by an Employee object whose salary could never ever change) though sometimes even then it can be useful (in a many web and other stateless applications, code doing read operations is separate from that doing updates, and using different objects may be natural - I wouldn't make an object immutable and then force that pattern, but if I already had that pattern I might make my "read" objects immutable for the performance and correctness-guarantee gain).
Copy-on-write is a middle ground. Here the "real" class holds a reference to a "state" class. State classes are shared on copy operations, but if you change the state, a new copy of the state class is created. This is more often used with C++ than C#, which is why it's std:string enjoys some, but not all, of the advantages of immutable types, while remaining mutable.
Making strings immutable has many advantages. It provides automatic thread safety, and makes strings behave like an intrinsic type in a simple, effective manner. It also allows for extra efficiencies at runtime (such as allowing effective string interning to reduce resource usage), and has huge security advantages, since it's impossible for an third party API call to change your strings.
StringBuilder was added in order to address the one major disadvantage of immutable strings - runtime construction of immutable types causes a lot of GC pressure and is inherently slow. By making an explicit, mutable class to handle this, this issue is addressed without adding unneeded complication to the string class.
Strings are not really immutable. They are just publicly immutable. It means you cannot modify them from their public interface. But in the inside the are actually mutable.
If you don't believe me look at the String.Concat
definition using reflector.
The last lines are...
int length = str0.Length;
string dest = FastAllocateString(length + str1.Length);
FillStringChecked(dest, 0, str0);
FillStringChecked(dest, length, str1);
return dest;
As you can see the FastAllocateString
returns an empty but allocated string and then it is modified by FillStringChecked
Actually the FastAllocateString
is an extern method and the FillStringChecked
is unsafe so it uses pointers to copy the bytes.
Maybe there are better examples but this is the one I have found so far.
string management is an expensive process. keeping strings immutable allows repeated strings to be reused, rather than re-created.
Why are string types immutable in C#
String is a reference type, so it is never copied, but passed by reference. Compare this to the C++ std::string object (which is not immutable), which is passed by value. This means that if you want to use a String as a key in a Hashtable, you're fine in C++, because C++ will copy the string to store the key in the hashtable (actually std::hash_map, but still) for later comparison. So even if you later modify the std::string instance, you're fine. But in .Net, when you use a String in a Hashtable, it will store a reference to that instance. Now assume for a moment that strings aren't immutable, and see what happens: 1. Somebody inserts a value x with key "hello" into a Hashtable. 2. The Hashtable computes the hash value for the String, and places a reference to the string and the value x in the appropriate bucket. 3. The user modifies the String instance to be "bye". 4. Now somebody wants the value in the hashtable associated with "hello". It ends up looking in the correct bucket, but when comparing the strings it says "bye"!="hello", so no value is returned. 5. Maybe somebody wants the value "bye"? "bye" probably has a different hash, so the hashtable would look in a different bucket. No "bye" keys in that bucket, so our entry still isn't found.
Making strings immutable means that step 3 is impossible. If somebody modifies the string he's creating a new string object, leaving the old one alone. Which means the key in the hashtable is still "hello", and thus still correct.
So, probably among other things, immutable strings are a way to enable strings that are passed by reference to be used as keys in a hashtable or similar dictionary object.
Just to throw this in, an often forgotten view is of security, picture this scenario if strings were mutable:
string dir = "C:\SomePlainFolder";
//Kick off another thread
GetDirectoryContents(dir);
void GetDirectoryContents(string directory)
{
if(HasAccess(directory) {
//Here the other thread changed the string to "C:\AllYourPasswords\"
return Contents(directory);
}
return null;
}
You see how it could be very, very bad if you were allowed to mutate strings once they were passed.
You never have to defensively copy immutable data. Despite the fact that you need to copy it to mutate it, often the ability to freely alias and never have to worry about unintended consequences of this aliasing can lead to better performance because of the lack of defensive copying.
Strings are passed as reference types in .NET.
Reference types place a pointer on the stack, to the actual instance that resides on the managed heap. This is different to Value types, who hold their entire instance on the stack.
When a value type is passed as a parameter, the runtime creates a copy of the value on the stack and passes that value into a method. This is why integers must be passed with a 'ref' keyword to return an updated value.
When a reference type is passed, the runtime creates a copy of the pointer on the stack. That copied pointer still points to the original instance of the reference type.
The string type has an overloaded = operator which creates a copy of itself, instead of a copy of the pointer - making it behave more like a value type. However, if only the pointer was copied, a second string operation could accidently overwrite the value of a private member of another class causing some pretty nasty results.
As other posts have mentioned, the StringBuilder class allows for the creation of strings without the GC overhead.
Strings and other concrete objects are typically expressed as immutable objects to improve readability and runtime efficiency. Security is another, a process can't change your string and inject code into the string
Imagine you pass a mutable string to a function but don't expect it to be changed. Then what if the function changes that string? In C++, for instance, you could simply do call-by-value (difference between std::string
and std::string&
parameter), but in C# it's all about references so if you passed mutable strings around every function could change it and trigger unexpected side effects.
This is just one of various reasons. Performance is another one (interned strings, for example).
There are five common ways by which a class data store data that cannot be modified outside the storing class' control:
- As value-type primitives
- By holding a freely-shareable reference to class object whose properties of interest are all immutable
- By holding a reference to a mutable class object that will never be exposed to anything that might mutate any properties of interest
- As a struct, whether "mutable" or "immutable", all of whose fields are of types #1-#4 (not #5).
- By holding the only extant copy of a reference to an object whose properties can only be mutated via that reference.
Because strings are of variable length, they cannot be value-type primitives, nor can their character data be stored in a struct. Among the remaining choices, the only one which wouldn't require that strings' character data be stored in some kind of immutable object would be #5. While it would be possible to design a framework around option #5, that choice would require that any code which wanted a copy of a string that couldn't be changed outside its control would have to make a private copy for itself. While it hardly be impossible to do that, the amount of extra code required to do that, and the amount of extra run-time processing necessary to make defensive copies of everything, would far outweigh the slight benefits that could come from having string
be mutable, especially given that there is a mutable string type (System.Text.StringBuilder
) which accomplishes 99% of what could be accomplished with a mutable string
.
Immutable Strings also prevent concurrency-related issues.
Imagine being an OS working with a string that some other thread was modifying behind your back. How could you validate anything without making a copy?
精彩评论