Sieve of Eratosthenes in Java: A Puzzle and some optimization
I did a quick implementation of the SoE algo in Java (code at the end). The output on my dual core AMD processor is:
Allocation: 31 Meat: 10140 Listing: 10171 Preparing end: 10187
The "Meat" section consumes the Maximum amount of time, as expected.
One observation I had was that using
Math.pow(variable, 2)
is slower than(variable * variable)
. I think other than the function jump, there might be other overheads.Does Math.pow(x, 2) have optimizations for powers of 2, 3 etc? I ask because there are some user contributed Java libraries out there with much faster multiplication algos than Java's native ones.
Here are my questions:
What arithmetic optimizations can you suggest to the Meat section? Is there any way I can avoid the modulus operator altogether?
The function does not work when start == end. If I do sieve(4, 4), the returned array is of length 1: [4]. What am I doing wrong? It should return [] (basically new int(0)).
What are some of the fast number/maths related Java libraries you know?
Thanks for reading. Finally Here's the code I wrote: Not GangOfFour/TopCoder quality, but not too pathetic either (I hope!, and code formatting at SO is sort of....weird?):
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class Sieve {
public static void main(String[] args) {
/* small test */
int[] primes = sieve(1, 1000000);
}
/**
* returns an array of prime integers
* in the given range
*
* @param start range start
* @param end range end
* @return
*/
private static int[] sieve(int start, int end) {
long startTime = System.currentTimeMillis();
/* some basic range checks */
if(end < start || start < 1 || end < 1) {
throw new ArithmeticException("Messed up input");
}
/* generate ints within range */
int[] naturals = new int[end-start+1];
for (int j = 0; j < end - start + 1; j++) {
naturals[j] = start + j;
}
System.out.println("Allocation: \t" + (System.currentTimeMillis() - startTime));
/* init running prime to start, and increment until
* running prime squared is greater than the end
*/
for (int runningPrime = (start == 1 ? 2: start); end > runningPrime*runningPrime; runningPrime++) {
for (int i = runningPrime; i < naturals.length; i++) {
if(-1 != naturals[i]) {
if(naturals[i] % runningPrime == 0) {
naturals[i] = -1;
}
}
}
}
System.out.println("Meat: \t\t" + (System.currentTimeMillis() - startTime));
if(naturals[0] == 1) {
naturals[0] = -1;
}
/* list primes */
List list = new ArrayList();
for (int i = 0; i < naturals.length; i++) {
if(-1 != naturals[i])
list.add(naturals[i]);
}
System.out.println("Listing: \t" + (System.currentTimeMillis() - startTime));
/* create the return int array */
int[] primes = new int[list.size()];
int k = 0;
for (Iterator iterator开发者_如何学C = list.iterator(); iterator.hasNext();) {
primes[k++] = ((Integer) iterator.next()).intValue();
}
System.out.println("Preparing end: \t" + (System.currentTimeMillis() - startTime));
return primes;
}
}
Thanks for all the feedback. This is the fixed version below (until someone manages to break it again :)
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class Sieve {
public static void main(String[] args) {
/* small test */
int[] primes = sieve(2, 5);
System.out.println("Number of primes: " + primes.length);
for (int i : primes) {
System.out.println(i);
}
}
/**
* returns an array of prime integers
* in the given range
*
* @param start range start
* @param end range end
* @return
*/
private static int[] sieve(int start, int end) {
long startTime = System.currentTimeMillis();
/* some basic range checks */
if(end < start || start < 1 || end < 1) {
throw new ArithmeticException("Messed up input");
}
/* generate ints within range */
int[] naturals = new int[(int)Math.floor((end-start+1) / 2) + 1];
int allocator = 0;
for (int j = 0; j < end - start + 1; j++) {
if(!((start + j) % 2 == 0)) {
naturals[allocator++] = start + j;
}
}
System.out.println("Allocation: \t" + (System.currentTimeMillis() - startTime));
/* init running prime to 2, and increment until
* running prime squared is greater than the end
*/
for (int runningPrime = 2; end >= runningPrime*runningPrime; runningPrime++) {
for (int i = 0; i < naturals.length; i++) {
if(-1 != naturals[i]) {
if(naturals[i] != runningPrime && naturals[i] % runningPrime == 0) {
naturals[i] = -1;
}
}
}
}
System.out.println("Meat: \t\t" + (System.currentTimeMillis() - startTime));
if(naturals[0] == 1) {
naturals[0] = -1;
}
/* list primes */
List list = new ArrayList();
for (int i = 0; i < naturals.length; i++) {
if(-1 != naturals[i])
list.add(naturals[i]);
}
System.out.println("Listing: \t" + (System.currentTimeMillis() - startTime));
/* create the return int array */
int size = list.size();
int k = 0;
/* tricky tricky :) */
if(start <= 2) {
size += 1;
k = 1;
}
int[] primes = new int[size];
if(start <= 2) {
primes[0] = 2;
}
for (Iterator iterator = list.iterator(); iterator.hasNext();) {
primes[k++] = ((Integer) iterator.next()).intValue();
}
System.out.println("Preparing end: \t" + (System.currentTimeMillis() - startTime));
return primes;
}
}
You can avoid modulo by re-writing the inner loop:
for (int i = runningPrime; i < naturals.length; i++) {
if(-1 != naturals[i]) {
if(naturals[i] % runningPrime == 0) {
naturals[i] = -1;
}
}
}
as
for (int i = runningPrime; i < naturals.length; i+=runningPrime) {
naturals[i] = -1;
}
I'm also slightly concerned that the inclusion of a start
parameter complicate things (considering the case of sieve(4, 10)
).
Assuming I didn't miss something I'd write:
for(int runningPrime = (start == 1 ? 2: start); end > runningPrime*runningPrime;
runningPrime++)
as
int limit = Math.sqrt(end);
for(int runningPrime = (start == 1 ? 2: start); runningPrime < limit;
runningPrime++)
to prevent the unnecessary multiplication each iteration. Also I would only fill the array with odd numbers, effectively halving its length.
Your solution is not the Sieve of Eratosthenes. That's obvious because you use the modulo
operator in your code; a proper Sieve of Eratosthenes uses only addition in the inner loop, not division or modulo. Here is a simple version of the Sieve of Eratosthenes, which imports BitSet
and LinkedList
from java.util
and returns a LinkedList of primes less than n:
public static LinkedList sieve(int n)
{
BitSet b = new BitSet(n);
LinkedList ps = new LinkedList();
b.set(0,n);
for (int p=2; p<n; p++)
{
if (b.get(p))
{
ps.add(p);
for (int i=p+p; i<n; i+=p)
{
b.clear(i);
}
}
}
return ps;
}
The basic idea is to create a sieve (the BitSet
b) with each item initially set to Prime
(represented as a set bit), iterate through the sieve looking for and reporting each successive prime, and when one is found striking all of its multiples from the sieve by marking it Composite
(represented as a cleared bit). The multiples are found by addition, not division, and the inner loop consists only of an addition, a bit-clearing operation, a comparison to look for the end of the sieve, and a jump back to the beginning of the loop, so it is very fast.
There are optimizations available that make the Sieve of Eratosthenes run even faster, but this should be enough to get you started. When you are ready for more, I modestly recommend this essay at my blog.
If you want primes in a range that doesn't start at zero, you need a segmented Sieve of Eratosthenes. I discussed the segmented Sieve of Eratosthenes previously on Stack Overflow, and also discuss it at my blog.
Filling with only odds (except 2), incrementing by runningPrime, and losing the divisibility check, already suggested, are probably the most important optimizations.
Java's Math.pow is for doubles! It does not have optimizations for squaring, mostly bc it immediately recasts 2 as a double.
In my opinion, before you start optimising, you should fix the two serious bugs.
I compiled your code as a Java program and then tried calculating
sieve(1, 9)
and
sieve(4,10);
The first case worked correctly except that 9 was deemed to be prime. The square root of 9 is a prime number, but your loop condition stops the sieving just before you get there.
In the second case, the alleged prime numbers are 4, 5, 6, 7, 8, 9, 10. This is because you skipped sieving by any of the primes below the start of the range. That, I'm afraid is an optimisation too far :-)
精彩评论