#### Optimizing the Sieve of Eratosthenes

by

, July 9th, 2012 at 06:29 PM (16613 Views)
The Sieve of Eratosthenes is one of the fastest methods for generating a list of prime numbers less than a given numbern. A general description of the algorithm is given below:

1. Initialize a list representing the numbers[2,n). This denotes whether a given number is prime or composite.

2. Start from the beginning of the list. If the number is denoted as prime, save it. Otherwise, move on.

3. For every multiple of the current prime, mark it as a composite number.

Below is code implementing the Sieve of Eratosthenes as naively as possible:

public static List<Integer> eratosthenes_naive(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); for (int i = 2; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(i); for (int j = i; j < is_composite.length; j += i) { is_composite[j] = true; } } } return results; }

While fairly quick, there are several optimizations which can be made. The first such optimization is to reduce the inner loop's span. It only needs to start fromj = i*ibecause all values smaller have been marked already.

public static List<Integer> eratosthenes_optimized1(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); for (int i = 2; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(i); for (long j = (long) i * i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }

Another optimization that can be made is that the external loop only needs to run for every odd number.

public static List<Integer> eratosthenes_optimized2(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>(); results.add(2); for (int i = 3; i < is_composite.length; i += 2) { if (!is_composite[i]) { results.add(i); for (long j = (long) i * i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }

A third optimization is to provide an upper bound for the list of prime numbers. An approximation of the upper bound of the prime counting function (the number of primes below a given number n) is1.25506 * n / ln(n).

public static List<Integer> eratosthenes_optimized3(int n) { boolean[] is_composite = new boolean[n]; List<Integer> results = new ArrayList<Integer>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 3; i < is_composite.length; i += 2) { if (!is_composite[i]) { results.add(i); for (long j = i * (long) i; j < is_composite.length; j += i) { is_composite[(int) j] = true; } } } return results; }

The last optimization is to completely remove all even numbers from theis_compositearray. This requires a transpose of all indices.

public static List<Integer> eratosthenes_optimized4(int n) { boolean[] is_composite = new boolean[n - 2 >> 1]; List<Integer> results = new ArrayList<>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 0; i < is_composite.length; ++i) { if (!is_composite[i]) { results.add(2 * i + 3); for (long j = 4L * i * i + 12L * i + 9; j < 2 * is_composite.length + 3; j += 4 * i + 6) { is_composite[(int) (j - 3L >> 1)] = true; } } } return results; }

Results

As with any attempts at optimizing code, it's important to benchmark the code. For this, I created a simple test which runs each method 10 times for n = 0x4000000 (~67 million). The results are given below. All times are in milliseconds.

naive

min: 2626

avg: 2928

max: 3121

optimized 1 (inner loop optimization)

min: 1607

avg: 1808

max: 1994

optimized 2 (outer loop optimization)

min: 1532

avg: 1658

max: 1835

optimized 3 (memory optimization)

min: 1516

avg: 1618

max: 1738

optimized 4 (structure optimization)

min: 845

avg: 930

max: 1040

I also tracked ideal memory use. Method 4 uses by far the least memory ~50 MB, method 3 is next at ~82 MB. The other methods are difficult to track due to the expanding nature of ArrayLists, but it's sufficient to say they will at best use at least ~82 MB of memory.

The test was also run with a larger data set with n = 0x3FFFFFFF (~1 billion).

naive

min: 57899

avg: 62960

max: 73454

optimized 1

min: 34069

avg: 35855

max: 38507

optimized 2

min: 26480

avg: 27870

max: 30047

optimized 3

min: 25290

avg: 27920

max: 30100

optimized 4

min: 17201

avg: 17324

max: 17914

Summary

Through careful testing and analysis it's possible to achieve ~2-3 times speedup over the naive Sieve of Eratosthenes. There are other programs which implement various features such as segmented sieves, wheel factorization, and multi-threading to achieve some very impressive results. The current leader as far as I know is primesieve. It can compute primes up 1 billion in ~0.08 seconds utilizing multi-threading, or in ~0.4 seconds with just a single thread.

Update

Rather than have a boolean (which takes ~ 1 byte) represent each number, I made a recent change which represents each number as a bit in a char. This drastically cuts memory use by ~50%. Performance is not drastically affected, possibly slightly faster.

public static List<Integer> eratosthenes_optimized5(int n) { if (n < 2) { return new ArrayList<Integer>(); } char[] is_composite = new char[(n - 2 >> 5) + 1]; final int limit_i = n - 2 >> 1, limit_j = 2 * limit_i + 3; // boolean[] is_composite = new boolean[n - 2 >> 1]; List<Integer> results = new ArrayList<>((int) Math.ceil(1.25506 * n / Math.log(n))); results.add(2); for (int i = 0; i < limit_i; ++i) { if ((is_composite[i >> 4] & 1 << (i & 0xF)) == 0) { results.add(2 * i + 3); for (long j = 4L * i * i + 12L * i + 9; j < limit_j; j += 4 * i + 6) { is_composite[(int) (j - 3L >> 5)] |= 1 << (j - 3L >> 1 & 0xF); } } } return results; }