Fastest way to pad a number in Java to a certain number of digits
Am trying to create a well-optimised bit of code to create number of X-digits in length (where X is read from a runtime properties file), based on a DB-generated sequence number (Y), which is then used a folder-name when saving a file.
I've come up with three ideas so far, the fastest of which is the last one, but I'd appreciate any advice people may have on this...
1) Instantiate a StringBuilder with initial capacity X. Append Y. While length < X, insert a zero at pos zero.
2) Instantiate a StringBuilder with initial capacity X. While length < X, append a zero. Create a DecimalFormat based on StringBuilder value, and then format the number when it's needed.
3) Create a new int of Math.pow( 10, X ) and add Y. Use String.valueOf() on the new number and then substring(1) it.
The second one can obviously be split into outside-loop and inside-loop sections.
So, any tips? Using a for-loop of 10,000 iterations, I'm getting similar timings from the first two, and the third method is approximately ten-times faster. Does this seem correct?
Full test-method code below...
// Setup test variables
int numDigits = 9;
int testNumber = 724;
int numIterations = 10000;
String folderHolder = null;
DecimalFormat outputFormat = new DecimalFormat( "#,##0" );
// StringBuilder test
long before = System.nanoTime();
for ( int i = 0; i < numIterations; i++ )
{
StringBuilder sb = new StringBuilder( numDigits );
sb.append( testNumber );
while ( sb.length() < numDigits )
{
sb.insert( 0, 0 );
}
folderHolder = sb.toString();
}
long after = System.nanoTime();
System.out.println( "01: " + outputFormat.format( after - before ) + " nanoseconds" );
System.out.println( "Sanity check: Folder = \"" + folderHolder + "\"" );
// DecimalFormat test
before = System.nanoTime();
StringBuilder sb = new StringBuilder( numDigits );
while ( sb.length() < numDigits )
{
sb.append( 0 );
}
DecimalFormat formatter = new DecimalFormat( sb.toString() );
for ( int i = 0; i < numIterations; i++ )
{
folderHolder = formatter.format( testNumber );
}
after = System.nanoTime();
System.out.println( "02: " + outputFormat.format( after - before ) + " nanoseconds" );
System.out.println( "Sanity check: Folder = \"" + folderHolder + "\"" );
// Substring test
before = System.nanoTime();
int baseNum = (int)Math.pow( 10, numDigits );
for ( int i = 0; i < numIterations; i++ )
{
int newNum = baseNum + testNumber;
folderHolder = String.valueOf( newNum ).substring( 1 );
}
after = System.nanoTime();
System.out.println( "03: " + outputFormat.format( after - before ) + " nanoseconds" );
System.out.println( "Sanity check: Fo开发者_如何学运维lder = \"" + folderHolder + "\"" );
I would stop doing optimizations based on micro-benchmarks and go for something that looks elegant codewise, such as String.format("%0"+numDigits+"d", testNumber)
Use String.format("%0[length]d", i)
For length of 8 it would be
String out = String.format("%08d", i);
It's slower, but the time spent typing and debugging the more complex code will probably exceed the total extra time ever used during execution.
In fact, if you add up all the man-hours already spent discussing this, it most likely exceeds the execution time savings by a large factor.
Inserting padding characters one by one is obviously slow. If performance is really that big a concern, you could use predefined string constants of lengts 1..n-1 instead (where n is the biggest expected length), stored in an ArrayList at the corresponding indexes.
If n is very big, at least you could still insert in bigger chunks instead of single chars.
But overall, as others pointed out too, optimization is only feasible if you have profiled your application under real circumstances and found which specific piece of code is the bottleneck. Then you can focus on that (and of course profile again to verify that your changes actually improve performance).
Here is a solution that is basically the same thing as your StringBuilder with two optimizations:
- It directly writes to an array bypassing the StringBuilder overhead
- It does the operations in reverse instead of insert(0), which requries an arraycopy each time
It also makes the assumptions that numDigits will be >= to the actual characters required, but will properly handle negative numbers:
before = System.nanoTime();
String arrString=null;
for ( int j = 0; j < numIterations; j++ ){
char[] arrNum = new char[numDigits];
int i = numDigits-1;
boolean neg = testNumber<0;
for(int tmp = neg?-testNumber:testNumber;tmp>0;tmp/=10){
arrNum[i--] = (char)((tmp%10)+48);
}
while(i>=0){
arrNum[i--]='0';
}
if(neg)arrNum[0]='-';
arrString = new String(arrNum);
}
after = System.nanoTime();
System.out.println( "04: " + outputFormat.format( after - before ) + " nanoseconds" );
System.out.println( "Sanity check: Folder = \"" + arrString + "\"" );
This method well outperformed your samples on my machine for negatives and was comparable for positives:
01: 18,090,933 nanoseconds
Sanity check: Folder = "000000742"
02: 22,659,205 nanoseconds
Sanity check: Folder = "000000742"
03: 2,309,949 nanoseconds
Sanity check: Folder = "000000742"
04: 6,380,892 nanoseconds
Sanity check: Folder = "000000742"
01: 14,933,369 nanoseconds
Sanity check: Folder = "0000-2745"
02: 21,685,158 nanoseconds
Sanity check: Folder = "-000002745"
03: 3,213,270 nanoseconds
Sanity check: Folder = "99997255"
04: 1,255,660 nanoseconds
Sanity check: Folder = "-00002745"
Edit: I noticed your tests resued some of the objects within the iteration loop, which I had not done in mine (such as not recalculating baseNum in the substring version). When I altered the tests to be consistent (not resuing any objects / calculations my version performed better than yours:
01: 18,377,935 nanoseconds
Sanity check: Folder = "000000742"
02: 69,443,911 nanoseconds
Sanity check: Folder = "000000742"
03: 6,410,263 nanoseconds
Sanity check: Folder = "000000742"
04: 996,622 nanoseconds
Sanity check: Folder = "000000742"
Of course as others have mentioned micro benchmarking is incredibly difficult / "fudgy" with all of the optimization performed by the VM and the inability to control them.
This probably related link discusses many of the ways to do it. I would recommend the Apache option, StringUtils, it may or may not be the absolute fastest, but its usually one of the easiest to understand, and has had the )&##@ pounded out of it, so it probably won't break in some unforeseen edge case. ;)
精彩评论