Improving performance of string concatenation in Java [duplicate]
Possible Duplicate:
java String concatenation
How to Improve performance of this chunk of code :
public static String concatStrings(Vector strings) {
String returnValue = "";
Iterator iter = strings.iterator();
while( iter.hasNext() ) {
returnValue += (String)iter.next();
}
return returnValue;
}
You might look at using StringBuilder, rather than doing += with the individual strings. Strings are immutable in Java, which means that once you create a String object, you cannot modify it. Using += on strings in a loop will result in the creation of many separate String instances, which may create performance problems. StringBuilder can concatenate Strings without having to create new instances, which may save some time, depending on the exact scenario.
public static String concatStrings(List<String> strings) {
StringBuilder sb = new StringBuilder();
for (String s : strings) {
sb.append(s);
}
return sb.toString();
}
Some remarks:
- Use
StringBuilder
whenever you need to build a string in a loop+
is fine for simple concatenation, but horrible for incremental build
- Whenever possible, use for-each for readability
java.util.Vector
issynchronized
; if you don't need this (costly) feature, just use anArrayList
.
Don't use raw types
JLS 4.8 Raw Types
The use of raw types is allowed only as a concession to compatibility of legacy code. The use of raw types in code written after the introduction of genericity into the Java programming language is strongly discouraged. It is possible that future versions of the Java programming language will disallow the use of raw types.
Effective Java 2nd Edition: Item 23: Don't use raw types in new code
If you use raw types, you lose all the safety and expressiveness benefits of generics.
See also
- Java string concatenation
- Java Tutorials - Generics
As suggested by other answers, using a StringBuilder
would probably be a better option.
The code given in the question will actually be compiled (with Sun's javac
) to something along the line of the following:
public static String concatStrings(Vector strings) {
String returnValue = "";
Iterator iter = strings.iterator();
while( iter.hasNext() ) {
String str = (String)iter.next();
StringBuilder sb = new StringBuilder(returnValue);
sb.append(str);
returnValue = sb.toString();
}
return returnValue;
}
The compiler will change the +=
string concatenation with one that uses a StringBuilder
. However, the compiler will probably rewrite the code inside the loop so a new StringBuilder
instance will be created on each iteration, which is not very performance-friendly.
Therefore, in this case, it would probably be a better idea to create a StringBuilder
outside the loop on our own, and perform manual string concatenation:
public static String concatStrings(Vector strings) {
StringBuidler returnValueBuilder;
Iterator iter = strings.iterator();
while( iter.hasNext() ) {
returnValueBuilder.append((String)iter.next());
}
return returnValueBuilder.toString();
}
private static final int AVERAGE_STRING_LENGTH = 10; // Using 10 is arbitrary
public static final String concatStrings(final Collection<String> strings) {
if (strings == null) return null;
final int size = strings.size();
if (size == 0) return "";
if (size == 1) return strings.get(0);
final StringBuilder returnValue =
new StringBuilder(AVERAGE_STRING_LENGTH * size);
for (String s : strings) {
returnValue.append(s);
}
return returnValue.toString();
}
Perhaps going a little overboard, here's every optimization I could think of for concatStrings()
- demonstrated above - some of which may not be applicable to your environment:
- Use
StringBuilder
- its far more efficient for these successive concatenations - Use
StringBuilder(int capacity)
to specify the likely necessary capacity, if there's any way to anticipate it (used average size above, but other methods may be more convenient) - Use the
Collection
parameter type to allow for a more efficient data structure thanVector
, which is synchronized - plus the caller has far greater flexibility (e.g. no need to copy aSet<String>
to aVector<String>
just to call this method) - Hardcode simple cases, if they are likely (e.g.
null
, size0
, and size1
cases above) - Use
final
to facilitate JIT inlining and optimization - Cache the size of
strings
, if its used multiple times. (e.g. used 3 times in the above code.)
Finally, if this operation is done very often over a large number of strings, look into Ropes for Java.
- Ropes for Java article - http://www.ibm.com/developerworks/java/library/j-ropes
- Ropes of Java implementation - http://ahmadsoft.org/ropes/
- Ropes (Wikipedia) - http://en.wikipedia.org/wiki/Rope_(computer_science)
Also if you are looking to make this faster, you could refactor the code to use ArrayList instead of Vector. ArrayList is not thread safe, so it is slightly faster than Vector (depends on the situation, may be 0% difference, may be 5% difference).
You are creating a string every time you call +=. For example
String theString = "1"; //Makes an immutable String object "1"
theString +="2"; //Makes a new immutable String object "12"
theString +="3"; //makes a new immutable String object "123"
Using a string builder avoids this problem.
StringBuilder sb = new StringBuilder("1"); //Makes a StringBuilder object holding 1
sb.append("2"); //The same StringBuilder object now has "12" in it.
sb.append("3"); //The same StringBuidler object now has "123" in it.
String theString = sb.toString(); //Creates a new String object with "123" in it
Note how in the first example we made all of those intermediate strings where in the second example we only created the StringBuilder and the final String (in both examples we created "1" "2" and "3" when we used them as arguments). You can see that there are fewer objects created in the first example, and if you're doing a lot of appending to the String, you can imagine how that adds up!
In addition to using a StringBuilder, you can walk the list of Strings beforehand and calculate the exact size of needed for the StringBuilder. Then pass this value into the StringBuilder constructor. Note that this would fall into the category of premature optimisation but you did ask for performance... (You should look at the code for growing the StringBuilder/StringBuffer buffers, its educational)
Apart from using ArrayList and StringBuilder, let's consider this.
In modern computer science paradigm, space can be almost always traded for time(perhaps, it's a subjective statement). For the given scenario, with the below code, an extra space of O(N) where N = no of strings(for the new buffer that holds list.toArray()) is used. This is better than using Iterator at least, (open AbstractList.iterator()). Importantly, the time-complexity is significantly better, by computing the concatenation of two strings at once, in one iteration, thus reducing the number of iterations by half! It's something like using Dynamic Programming approach(Remember, computing Fibonacci nos using Dynamic Programming)!!
StringBuilder sb = new StringBuilder();
Object[] o = list.toArray();
//For even no of Strings
if(o.length % 2 == 0){
concatFaster(sb, o);
} else {
//For odd no of Strings
concatFaster(sb, o);
sb.append(o[o.length-1]); // For the odd index
}
public static void concatFaster(StringBuilder sb, Object[] o) {
for (int i = 0; i < o.length - 1; i+=2) {
sb.append(o[i]).append(o[i+1]);
}
}
精彩评论