开发者

Search cost of string interning and declaration of literal strings

Two Questions.

  1. When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning (method intern of class String)?

  2. In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool. And if there are many strings in the pool, it may be high cost. (maybe tradeoff of searching cost and memory?) On this point of view, it might be dangerous to declare mant literal strings. How significant is this searching cost and why java is designed in this way(searching pool when literal strings are declared).

Following is what I referred to understand background.


The 开发者_StackOverflow社区JavaDoc for the java.lang.String class states:

Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.

http://www.janeg.ca/scjp/lang/strLiteral.html comments:

In other words, because the compiler knows the strings original value cannot be changed once it's created it can safely use existing data and avoid cluttering up memory with duplicates.


You're confusing compile time complexity with runtime complexity.

When the class is loaded, yes it does a search to see if each literal already exists (although I imagine it would use a hashmap for O(1) lookup instead of your proposal).

When the code runs, it has the reference to the string in memory so there is no additional cost than a non-literal.

So yes, literals are interned. According to the Javadoc for String,

A pool of strings, initially empty, is maintained privately by the class String.

You can invoke intern() on a String to add it to this pool. It logically follows that if a.equals(b) then a.intern() == b.intern(), since .intern() guarantees to return from the a unique pool.

Example:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

Results when run:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

I should point out that the assumption will never be true. The Java language itself has many Strings that would be interned before our Strings ever got the chance to see the light of day. However, assuming everything is loaded sequentially, if you consider only the delta of Strings being interned, and assume no collisions with existing interns (we all know interns can be fussy and full of drama, right? snicker) then the numbers do indeed indicate the delta of the size of the string pool.


1 - When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning(method intern of class String)?

Yes. This process is called interning. However, it happens just once ... when the class containing the literal is loaded.

2 - In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool.

No it doesn't. The pool is a hash table.

... And if there are many strings in the pool, it may be high cost.

No it won't. The cost of a lookup in the string pool hash table is O(1).

... On this point of view, it might be dangerous to declare many literal strings.

The cost is not significant compared with the other costs of loading and then JIT compiling a class file. There is no performance related "danger" in declaring lots of literal strings.

Obviously, the String objects corresponding to string literals occupy memory "permanently", and you generally don't want to waste memory unnecessarily. But if you need to use those constant strings, they have to be represented somehow. And other ways of representing them either use the memory in other ways, or involve other runtime costs; e.g. the costs of reading them from a file or retrieving them from database.

The benefit of interning string literals is that the heap does not get cluttered up with multiple copies of the same literal string. This is probably not significant for typical SE / EE applications, but for the ME platforms heap memory is at a premium, and it would be a bad thing to waste it.


@RENO asks about the number of times Strings get interned. There are two cases:

  • Explicit calls to String.intern() happen as many (or as few) times as the application chooses to make.

  • For String literals, the javac compiler will ensure that a given .class file does not contain multiple copies of any String literal in its constant pool. This means that a class that has a given literal in many places will only result in the literal being interned once when the class is loaded. However, if you have two classes with the same literal string in their respective source code, they will both have the string value in their respective constant pools, and both intern the string when the respective classes are loaded.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜