Sieve of Eratosthenes: Memory & Performance
It appears to me that the Sieve of Eratosthenes, while easy to implement, is not at all scalable.
If you have to create a boolean array to store all of those values in up to the limit, that begins to use obscene amounts of memory as the limit is raised. When I tried to run my Sieve algorithm with a limit of Integer.MAX_VALUE, after about 20-30 secs of computation time (slow...) I got an OutOfMemoryError.
Is there any way to execute it more efficiently and with better performance?
I was playing around with it to figure out how to divide and conquer the algorithm so that the JVM (or just the system if you're using C) is able to clear used memory.
The problem is that while checking each position up to the limit, you do another walk up the tree to the limit within each iteration, so I have been unable to find a way to divide and conquer it without losing accuracy or needed records.
Does anyone know a good way to do this?
Re: Sieve of Eratosthenes: Memory & Performance
You're talking about a couple different things here.
Your OutOfMemoryError is probably caused by trying to initialize an array that large. That's before actually filling the array with anything, or stepping through it, or doing anything related to the algorithm. To see what I mean, try running a main method that simply initializes that large an array and then prints out the first index.
There are probably way to improve efficiency, but it really depends on what you want your output to be: if you want your output to be an array of numbers up to the limit all marked either prime or not prime, then you might be out of luck. But if you just want to output the primes, you can get rid of some of the overhead.
This smells a bit like homework though, so I'll leave you with this, which includes some suggestions: Sieve of Eratosthenes - Wikipedia, the free encyclopedia
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
KevinWorkman
You're talking about a couple different things here.
Your OutOfMemoryError is probably caused by trying to initialize an array that large. That's before actually filling the array with anything, or stepping through it, or doing anything related to the algorithm. To see what I mean, try running a main method that simply initializes that large an array and then prints out the first index.
There are probably way to improve efficiency, but it really depends on what you want your output to be: if you want your output to be an array of numbers up to the limit all marked either prime or not prime, then you might be out of luck. But if you just want to output the primes, you can get rid of some of the overhead.
This smells a bit like homework though, so I'll leave you with this, which includes some suggestions:
Sieve of Eratosthenes - Wikipedia, the free encyclopedia
No this isn't homework. Programming is a hobby/planned career of mine and I love doing it.
What I'm looking for is just randomly generating a prime number, and I would prefer not to fall back on the BigInteger class.
This is ultimately connected to an in-development app for Android market, and I'm sort of wasting time on this so I want to get it done. The only use for it right now is generating prime hash numbers used in a hasing algorithm for a particular object.
ADDITONAL:
The reason I don't want to use BigInteger.getProbablePrime is that I can't find a good way to constrain the results to be within a certain range.... like 2-60 or something. I am able to do that if I write an algorithm that generates using the Sieve.
Re: Sieve of Eratosthenes: Memory & Performance
Do you have to calculate them on the fly each time? Just hardcode a list to contain every prime, and choose one at random when you need it- boom, randomly generated prime number.
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
KevinWorkman
Do you have to calculate them on the fly each time? Just hardcode a list to contain every prime, and choose one at random when you need it- boom, randomly generated prime number.
But.... that's tedious. And we're talking every prime number up to Integer.MAX_VALUE. Not just from 2-100.
Oh and I just wanted to add I'm also self-taught with a year now of experience. Last year I spent three weeks or so doing nothing but learning Java all-day every day. And now here I am. I mean... who needs a teacher?
Re: Sieve of Eratosthenes: Memory & Performance
You're better off repeatedly generating a random odd integer and then checking to see if it's prime. The probability of picking a prime number is quite high. See: http://en.wikipedia.org/wiki/Prime_number.
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
bgroenks96
But.... that's tedious. And we're talking every prime number up to Integer.MAX_VALUE. Not just from 2-100.
I would wager you've spent more time thinking about this problem than it would take to find a listing of every prime up an arbitrarily high number (or to just use a dumb slow program that outputs them once), then write code that reads that file in. Up to you though.
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
KevinWorkman
I would wager you've spent more time thinking about this problem than it would take to find a listing of every prime up an arbitrarily high number (or to just use a dumb slow program that outputs them once), then write code that reads that file in. Up to you though.
But wouldn't it also take a lot of memory to store a hard coded array of prime numbers? Or if it was a file you would have to read the entire file each time, not to mention the size of the file....
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
bgroenks96
But wouldn't it also take a lot of memory to store a hard coded array of prime numbers? Or if it was a file you would have to read the entire file each time, not to mention the size of the file....
Even if you did it by generating the array on the fly each time, you still have to deal with those issues. Actually those issues are made worse because of the overhead of whatever algorithm you go with. Storing them in a file and reading them in has no algorithmic overhead, because you did that work ahead of time. With your way, you would still have to generate the array each time.
Reading in the file is done during loading, and I don't expect it to take very long- throw up a loading bar if it's a big deal. You mention the size of the file as a restriction- how big do you think it's going to be? How big is too big? Have you downloaded the file and tested your assumptions at all?
With the method I'm suggesting, you can even get around reading in the whole file. Figure out how many primes you need, generate that many random numbers, then read in the prime numbers stored in those locations in the file.
But all of this depends on your actual context. I've made my suggestion, but it's up to you. I'm not going to try to convince you to go one way or the other, but I would highly suggest actually testing your assumptions out before dismissing my suggestion. Try it out both ways and see which method suits your needs best.
Re: Sieve of Eratosthenes: Memory & Performance
Some key statistics:
To generate all prime numbers less than Integer.MAX_VALUE - 1, it takes ~2.4Gb of memory. On my computer (first gen i5) it takes ~2.5 minutes.
To store all prime numbers less than Integer.MAX_VALUE - 1, it takes ~400Mb. Note that this is in binary mode (assuming 32-bit integers), if you store numbers in text mode the size will be much bigger. I kind of gave up on the time it takes to read this file in after a few minutes.
To contrast these two methods, generating a random number and checking to see if it's prime takes on average 0.1 seconds using a naive primality checker, and 0.02 seconds using a Miller-Rabin probabilistic test. The memory consumption is around 100 bytes or less (it's O(1)).
edit: found an optimization for generating all primes less than Integer.MAX_VALUE, takes ~70 seconds to run now (memory use is unaffected).
Re: Sieve of Eratosthenes: Memory & Performance
I believe this is what you're looking for:
Code java:
//shows all prime numbers to a specified max
ArrayList<Long> sieve = new ArrayList<Long>();
public static ArrayList<Long> primesUpTo(int n) {
boolean[] isPrime = new boolean[n];
for (int i = 0; i < n; i++) isPrime[i] = true;
for (int j = 2; j < isPrime.length; j++) {
int prime = j;
if (!isPrime[prime]) continue;
sieve.add((long) prime);
for (int i = prime * 2; i < isPrime.length; i = i + prime) isPrime[i]=false;
}
return sieve;
}
Re: Sieve of Eratosthenes: Memory & Performance
Quote:
Originally Posted by
aesguitar
I believe this is what you're looking for:
Please explain how this helps answer the original poster's question.
Helloworld gave a darn good rundown of the options, so I'd recommend taking Kevin's advice and test them to see which one works in the context you have.
Re: Sieve of Eratosthenes: Memory & Performance
Good point copeg, is there any way to implement it into a concursive method?
Pseudocode for dual-core:
core0 performs the sieve up to half Integer.MAX_VALUE
core1 performs the sieve from half Integer.MAX_VALUE
combine the results
then print the results.
I guess what I'm saying is that is there any way to perform the sieve starting at a number other than 0 or 2? I've tried writing the algorithm like that, but I can't figure out how to start it from a certain point, I can only print out the primes from x to y. It still has to run through everything.
Re: Sieve of Eratosthenes: Memory & Performance
I've actually written a small program (it was for a homework assignment in a college Java class I took) that actually checked for Prime numbers, and did it fairly quickly.
In fact, I found a list of primes in the range of two to the max integer value (used a Google search). I was able to iterate over the values (starting at 2), evaluate whether or not it was prime, and if it was, write it to a file. That all happened in under a second.
Here's a hint:
Use a square root. If the number you're checking (say, 25) has a whole number, integer square root (square root of 25 is 5), then the number you're checking is not, by definition, prime. If the number you're checking has a decimal square root, to check to see if it is prime, all you have to do is iterate, dividing the number you're checking by the next number in line.
As an example, sqrt(101) = 10.0498756
So take 101, and divide it by 2, 3, 4, up to 10. If it divides by any of those evenly, then it's not prime.
There are other tricks, but I think this will help you get started.
Also, recall the the only even prime number is 2.