开发者

Why does .NET IL always create new string objects even when the higher level code references existing ones?

Background: We have an XML document containing thousands of pseudocode functions. I've written a utility to parse this document and generate C# code from it. Here's a greatly simplified snippet of the code that gets generated:

public class SomeClass
{
    public string Func1() { return "Some Value"; }
    public string Func2() { return "Some Other Value"; }
    public string Func3() { retur开发者_运维知识库n "Some Value"; }
    public string Func4() { return "Some Other Value"; }
    // ...
}

The important takeaway is each string value may get returned by multiple methods. I assumed that by making a minor change so that the methods would instead return references to static member strings, this would both cut down on the assembly size and reduce the memory footprint of the program. For example:

public class SomeClass
{
    private const string _SOME_VALUE = "Some Value";
    private const string _SOME_OTHER_VALUE = "Some Other Value";
    // ...

    public string Func1() { return _SOME_VALUE; }
    public string Func2() { return _SOME_OTHER_VALUE; }
    public string Func3() { return _SOME_VALUE; }
    public string Func4() { return _SOME_OTHER_VALUE; }
    // ...
}

But to my surprise, inspection using the .NET ildasm.exe utility shows that in both cases the IL for the functions is identical. Here it is for one of them. Either way, a hard-coded value gets used with ldstr:

.method public hidebysig instance string
        Func1() cil managed
{
  // Code size       6 (0x6)
  .maxstack  8
  IL_0000:  ldstr      "Some Value"
  IL_0005:  ret
} // end of method SomeClass::Func1

In fact, the "optimized" version is slightly worse because it includes the static string members in the assembly. When I repeat this experiment using some other object type besides string, I see the difference that I expect. Note that the assemblies are generated with optimization enabled.

Question: Why does .NET apparently always create a new string object regardless of whether the code references an existing one?


  IL_0000:  ldstr      "Some Value"
  IL_0005:  ret

The disassembler is being too helpful to show you what is really going on. You can tell from the IL address, note that the ldstr instruction takes only 5 bytes. Way too few to store that string. Use View + Show token values to see what it really looks like. You'll now also see that the same strings uses the same token value. This is called 'interning'.

The token value still doesn't show you where the string is really stored after the program is jitted. String literals go into the 'loader heap', a heap distinct from the garbage collected heap. It is the heap where static items are stored. Or to put it another way: string literals are highly optimized and very cheap. You cannot do better yourself.


See http://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.ldstr(v=vs.71).aspx

The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning").


I don't have Visual Studio in front of me right now so I can't give the concise answer I would like. The MSIL you are showing makes it appear as if the strings are not being interned. Try using object.ReferenceEquals(...) to see if this is really the case or even open the compiled library in a text editor. If the strings are not being interned there may be a project setting to enable interning (again no VS in front of me to give you an exact reference).

Your other option is to change your string definitions to be static readonly which should make the methods return a reference to the static instance. Note that using this method creates an implicit static constructor that will create the string instances the first time the class is referenced.


Strings found in IL code are always, so no new strings are being constructed. You can verify this with this code:

     string str = "123";
     string isinterned = string.IsInterned (str);
     Console.WriteLine(ReferenceEquals(str, isinterned));

Constants are intended to be used as literals everywhere (in IL), not just strings. If that is not what you want (I know of some valid cases for this, like getting updated 'constant values' for a newer version of an assembly), try static readonly like this instead.

public static readonly string _SOME_VALUE = "Some Value";
public static readonly string _SOME_OTHER_VALUE = "Some Other Value";


.NET emulates String objects as a primitive type, despite it being a Char array. A primative type is always cloned when it is passed to a function. So in this, .NET will always clone String values when any manipulation or passing is performed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜