•

Check out our Community Blogs Register and join over 40,000 other developers!

### Popular Tags      # Finding Primes Faster (Sieve of Eratosthenes)

nested loop

16 replies to this topic

### #1 fkl

fkl

CC Devotee

• Senior Member
•      • 417 posts

Posted 17 July 2011 - 12:58 PM

Find all prime numbers less than 1 million

Generating large prime numbers or even testing huge numbers for being prime or not has been a daunting task.

To clarify the basics, a number is called prime if it can be divided by only 1 and itself and no other. This means a prime can't have more than two divisors. For e.g. 2,3,5,7 etc. are all primes.

A general or common approach to test if a number is prime or not starts with testing from 2 onwards up to half of the number trivially, though actually up to square root is sufficient.

This means that if given a number e.g. 9999 whose square root is roughly 99.99 (obviously not perfect otherwise it is already not a prime), we can test if any number between 2 and 99 divides it completely. If not than no other greater number would divide either and hence this would be a prime.

Now in practice if being asked to generate say the first 1000 prime numbers, it is going to be very in efficient that you start from 2 and keep going in a loop. Then for each value you take the number’s square root and check if it divides any number between 2 and square root. If not, it is a prime. This will be a O(n)^2 algorithm.

There has been a well known efficient algorithm for doing the above task with two constraints:

1. An upper limit needs to be known in advance i.e. 1, 10 million etc.
2. One should be able to allocate array of that size in a computer program.

The idea is basically that you create an array of the given size (1 million) initialized with all 0’s. Now start with number 2 at index 2 of array. Since the value at this index is 0, mark it as a prime. Then run a loop up to 1 million marking all multiples of 2 in array as non primes.

Then keep incrementing after 2 up to a million. If the current index is found to be 0 or unmarked, it means it is not divisible by any of the numbers we have encountered so far. Otherwise it would have been marked as non prime and wouldn’t be zero.

This is known as Sieve of Eratosthenes and is considered a very efficient algorithm for finding primes. Though it still runs in two nested loops, but the performance gains are highly significant and it’s run time complexity amounts to be O(n(logn)(loglogn)). For details see
Sieve of Eratosthenes - Wikipedia, the free encyclopedia

I have used a separate print function which takes a little fancy formatting for 1st 2nd and 3rd primes into account. The core function is gen_sieve_primes which runs two nested loops. The outer loop runs from 2 to max number. If value at current index is 0 meaning unmarked, it marks that as a prime by assigning 1. Then it proceeds to second loop in which it marks all multiples of current number (for 2, those would be 4,6,8, 10…. Up to 1 million) as non primes by assigning them -1.

```#define MAX_NUM 1000000 // 200

// array will be initialized to 0 being global
int primes[MAX_NUM];

void gen_sieve_primes(void)
{
for (int p=2; p<MAX_NUM; p++) // for all elements in array
{
if(primes[p] == 0) // it is not multiple of any other prime
primes[p] = 1; // mark it as prime

// mark all multiples of prime selected above as non primes
int c=2;
int mul = p * c;
for(; mul < MAX_NUM;)
{
primes[mul] = -1;
c++;
mul = p*c;
}
}
}

void print_all_primes()
{
int c = 0;
for(int i=0; i<MAX_NUM; i++)
{
if(primes[i] == 1)
{
c++;

if(c < 4)
{
switch(c){
case 1:
cout << c << "st prime is: " << i << endl;
break;
case 2:
cout << c << "nd prime is: " << i << endl;
break;
case 3:
cout << c << "rd prime is: " << i << endl;
break;

default:
break;
}
}

else
cout << c << "th prime is: " << i << endl;
}
}
}

int main()
{
gen_sieve_primes();

print_all_primes();

return 0;
}
```

Following is the run of above algorithms using 200 as the max value. Then I ran this on 1 million numbers size array using Visual Studio on Vista 32 bit machine. The reason for mentioning this is, you may or may not be able to create an array as large depending upon the available memory and type of OS. On a 64 bit machine, one might be able to go even further.

It took roughly a couple of minutes to show all primes less than a million. Following is the end of prints showing that there are some 78,498 primes in 1 million numbers. There can be a few optimizations to improve this algorithm as suggested in the link I referred above. For e.g. instead of incrementing by 1 while going from 2 to 1 million in the outer loop, we can treat 2 as a special case, start from 3 and do increments of 2. This is because we know no even number can be a prime and all of those would be marked while running the loop for 2.

From space perspective, instead of an integer which might be 32bit or even 64 bit on some machines, we could use a one byte variable i.e. int8_t for array, since we only need 3 states (unexplored, prime, and not prime) represented by 0, 1 and -1 in the current program. This is going to reduce the amount of memory required to between 1/4th to 1/8th.
• 0
Today is the first day of the rest of my life

### #2 Axel

Axel

Posted 30 July 2011 - 03:52 PM

(Tell me if this is a bump for you guys, I'm new to this forum so don't be angry please)
Wouldn't it be even faster if you would first of all ignore the numbers that end with '0', '2','5' ?
• 0

### #3 fkl

fkl

CC Devotee

• Senior Member
•      • 417 posts

Posted 31 July 2011 - 03:49 AM

Sure No problem! Please feel free to pass any opinion. That is the objective of this forum.

Towards the end of tutorial, an improvement is mentioned regarding iterating through odd numbers only. That cover those ending in 0 or 2. As soon as you mark 5 as a prime, all numbers that are multiples of 5 will already be marked as non primes according to the algorithm.

• 0
Today is the first day of the rest of my life

### #4 Axel

Axel

Posted 31 July 2011 - 03:52 AM

yes What is the biggest value you can get ? 12 million digits ? I would use long long but that still wouldn't beat the largest discovered prime number which is around 12m digits
• 0

### #5 fkl

fkl

CC Devotee

• Senior Member
•      • 417 posts

Posted 31 July 2011 - 04:01 AM

Certainly not 12 million digits. Point 2 in the start mentions the constraint of array size. I guess it works fine up to 10 million numbers (number of primes in that range would be significantly less).

This algo as stated on Wikipedia too is only regarding small primes i.e. those under 10 million.

The primary advantage of this algorithm is, say you have a program which randomly inputs numbers between 1 - 10 million. Now you have to compute whether each number is a prime or not.

It works faster than any other algorithm (probably) and is known for the same efficiency.
• 0
Today is the first day of the rest of my life

### #6 john coner

john coner

Posted 04 October 2011 - 05:16 AM

can any 1 give me the tutorials of c programing, complete i want please rply soon..... • 0

### #7 GordonBGoodness

GordonBGoodness
• Location:Thailand
• Programming Language:C, Java, C++, Objective-C, C#, (Visual) Basic, JavaScript, Delphi/Object Pascal, Visual Basic .NET, Pascal, Assembly, Fortran, VBScript, Others

Posted 28 May 2012 - 08:49 PM

Certainly not 12 million digits. Point 2 in the start mentions the constraint of array size. I guess it works fine up to 10 million numbers (number of primes in that range would be significantly less).

This algo as stated on Wikipedia too is only regarding small primes i.e. those under 10 million.

The primary advantage of this algorithm is, say you have a program which randomly inputs numbers between 1 - 10 million. Now you have to compute whether each number is a prime or not.

It works faster than any other algorithm (probably) and is known for the same efficiency.

Sorry to burst your bubble that this is a particularly fast algorithm as you have implemented it, "Programming Expert", but your C/C++ listing as above misses almost too many speed and usability enhancements to list, as follows:

1. It uses an array with one element for each potential prime even though the only even prime is "two" for a wastage of half the array size.

2. Uses a four-byte integer rather than a one-byte char per array element for an array size at least four times as large as necessary; actually, all that is needed is one bit per potential prime and those could be packed for a further improvement in utilization of memory by a factor of at least eight. This does not need to cost time in process as discussed in a later point and as implemented by "primesieve".

3. A packed bit array as per the above would save large amounts of time for pre-filling and for counting the total number of primes in that filling will be at least eight times faster and a counting algorithm can be written that counts all the bits in a byte using a look up table.

4. For large ranges, uses a huge array in main memory which is very inefficient as this will cause modern CPU's to frequently have cache "misses" at a cost of 100's of CPU clock cycles per miss; better algorithms are written to be paged or segmented with an array size which is less than the L1 data cache size of the CPU, usually about 16 to 32 and sometimes 64 KBytes, with this page swept across the whole range of potential primes and only the base primes up the the square root of the maximum number in the range needing to be kept in memory. For your current algorithm, maximum range will be limited by the amount of memory in the system; a paged algorithm is only limited by the ability to store a representation of the base primes. Paging also permits the idea of multi-processing in that one can divide the pages between the processor cores.

5. Your algorithm uses a multiplication in the culling loop where a true Sieve of Eratosthenes algorithm only requires repeated additions (by repeated adding the prime number itself to the index in the terms of your algorithm).

6. Culling can start at the square of the prime to be culled, as all previous multiples will have already been culled by previous culls (ie 5 times 3 has already been culled by 3 times 5).

7. Culling is only necessary up to the square root of the maximum number for a reason related to the above; numbers less than the square have already been culled and the square of any higher prime number is beyond the maximum.

8. One can pre cull the initialization array for all the base primes up to say 13 or 17 or 19 as a "wheel factorization" algorithm to save quite a large amount of culling.

9. One can write a culling algorithm eliminate repeat culling for factors of some number of base primes such as 3 and 5 or even higher (3, 5, and 7) for even more elimination of redundant culling one the above pre-screening has been done, as it is then known that there will be no multiples of those primes left in the array.

10. And probably more tricks. Using the techniques mentioned, one can calculate all the prime numbers up to the 32 bit number range (4,294,967,295) on a lowly 1 MHz 6502 such as in an old Apple II or Commodore 64 using only about 48 KBytes of RAM in about 20 hours (80,000 seconds) or the primes up to your million as here in about 20 seconds on those same machines using machine language (as efficient compilers never were developed for eight bit processors), which isn't all that much slower than your algorithm on a modern machine which is 1000's of times faster and with huge amounts more of RAM.

For advanced applications of the Sieve of Eratosthenese, see the following links:

Tomás Oliveira e Silva:

http://www.ieeta.pt/...rime_sieve.html

The primesieve algorithm runs the 32 bit number range (4294967295) in about 0.22 seconds as four threads with a 32 KByte specified cache using 64 bit code under Beta Windows 8 on a stock 3.5 GHz Intel i7-2700K and according to the author about twice that fast again on a over-clocked 3.9 GHz AMD X6 with six threads and I would expect something the same from new Intel six core processors. Unfortunately, the current AMD eight core Bulldozer parts are dogs for this algorithm due to excessive miss branch prediction chasing, and they run about three times this slow.

Isn't it interesting that we now have desktop computers that within the capabilities of those older desktops, are now able to compute the same answer over half a million times faster?
• 0

### #8 fkl

fkl

CC Devotee

• Senior Member
•      • 417 posts

Posted 28 May 2012 - 10:14 PM

Thank you for the response above.

I probably can discuss trade offs with quite many of the arguments. But more importantly, discuss things technically, not sarcastically which is of little learning use.

Most of the optimizations you mentioned above are correct. Do me a favor, code them all together and assemble them into a simple tutorial intended for CS students. At the very least i can say that some thing that took you 10 points and half page to write about, when coded, would be a lot complex to digest for any early CS students. Even if it is, the primary intension of tutorial is not writing the most efficient and optimzed code (no harm in discussing them AFTER the basic algo is clear though), rather it is to make the algorithm as simple as possible for students to understand and be able to expand on themselves.

The code we write here is correct and works to the point of demonstrating basic algorithm implementation. I myself mentioned that there can be many optimizations but they do not exactly fall into the scope of how sieve algorithm works. Perhaps all of above points are taken away if one writes in pseudo code.

This is not professional or production code. Otherwise, i can still add a whole list of further possible optmizations as well as error checkings.

Just as a side note i work in the embedded industry and write efficient code where needed. However, being a teacher too i have learned to sacrifice many optimizations in place of simplicty. Had this not been the case every algorithm book in the world won't use plain pseduo code for illustrating algos.

But thanks again for adding valueable insight. It certainly adds to the value of tutorial for some one who learns and uses it for his own needs. Keep contributing.

Cheers,
Fayyaz
• 0
Today is the first day of the rest of my life

### #9 GordonBGoodness

GordonBGoodness
• Location:Thailand
• Programming Language:C, Java, C++, Objective-C, C#, (Visual) Basic, JavaScript, Delphi/Object Pascal, Visual Basic .NET, Pascal, Assembly, Fortran, VBScript, Others

Posted 29 May 2012 - 12:30 AM

Most of the optimizations you mentioned above are correct. Do me a favor, code them all together and assemble them into a simple tutorial intended for CS students. At the very least i can say that some thing that took you 10 points and half page to write about, when coded, would be a lot complex to digest for any early CS students. Even if it is, the primary intension of tutorial is not writing the most efficient and optimzed code (no harm in discussing them AFTER the basic algo is clear though), rather it is to make the algorithm as simple as possible for students to understand and be able to expand on themselves.

The code we write here is correct and works to the point of demonstrating basic algorithm implementation. I myself mentioned that there can be many optimizations but they do not exactly fall into the scope of how sieve algorithm works. Perhaps all of above points are taken away if one writes in pseudo code.

But thanks again for adding valueable insight. It certainly adds to the value of tutorial for some one who learns and uses it for his own needs. Keep contributing.

Cheers,
Fayyaz

Fayyaz, perhaps I went overboard in criticism of your code given its purpose is for instructional purposes, but many of the above points are valid even for that purpose, as follows:

1. If you are going to teach something about the Sieve of Eratosthenes as per your Wikipedia link, then one should be true to the spirit of that algorithm and one of those points is that the culling process can be done by only using addition or by "counting up" by the span of the prime number being culled.

2. Having started with a very basic implementation of the SoE, one could then show how this could be optimized by starting at the square of the prime being culled, stopping at the square root of the maximum, and in eliminating the consideration of the odd primes as you suggest but don't give an example.

3. One could then move along to other optimizations as mentioned in the Wikipedia article such as paging and wheel factorization in order for your students to see the progression of program development and tuning.

The Wikipedia animation actually does show some of these things, including culling from the square of the prime and stopping at the square root of the maximum.

Moving up to the next step, do you really think that the following is any harder to follow than your code, and yet it follows the exact spirit of the Wikipedia article:
```#include<math.h> //for the sqrt() function used in define
#include<memory.h> //for the memset() function

#define MAX_NUM 1000000

#define NUM_PRE_ARRAY_PRIMES 1 //for the number 2 which is only even prime
#define FIRST_ODD_PRIME 3

#define MAX_NDX (MAX_NUM - FIRST_ODD_PRIME) / 2 //to eliminate numbers < 3 and only include odds
#define SQRT_LIM (unsigned int)(sqrt((double)MAX_NUM)) //index of square root of MAX_NUM rounded down

unsigned char primes[MAX_NDX + 1]; //make composite number array large enough to include maximum.

void gen_sieve_primes(void)
{
memset(primes, 1, MAX_NDX + 1); // marks all elements of the array as potential primes

for (unsigned int i = 0, lim = SQRT_LIM; i <= SQRT_LIM; ++i) // test for primes to cull composites
if (primes[i] != 0)
{
//you will find that the indexes of the odd composite numbers for a prime are exactly prime apart
unsigned int prime = i + i + FIRST_ODD_PRIME; //generate prime from the index without a multiply = 2 * i + FIRST_ODD_PRIME
//starting at the index of the square of the prime composites to be culled
for (unsigned int j = (prime * prime - 3) >> 1; j <= MAX_NDX; j += prime) //shift right is / 2 faster
primes[j] = 0; //cull all composites to maximum limit of array
}
}```

yet the above code is many times more efficient than yours. Note that the above code embodies my suggestions points 1, 2, 5, 6, and 7 or half of the suggestions. From there, you would move into further optimizations such as packing the primes array into bits for just a few additional lines of code (suggestion number 3), to paging the array with just another few lines of code (suggestion number 4), to pre-culling the initialized array (suggestion number 8)and so on until your students understood the more complex optimizations one can make. Only suggestion number 9 actually is all that complex especially when combined with extreme bit packing as per an optimization of suggestion number 3.

In this way, your students would be led up from what is excessively simple to something that could be moderately useful and learn some optimization techniques while they were at it.

Cheers, Gordon
• 0

### #10 fkl

fkl

CC Devotee

• Senior Member
•      • 417 posts

Posted 29 May 2012 - 02:02 AM

I believe i wasn't clear enough. Will try once more.

In my opinion, YES the above is a lot harder for a newbie to understand compared to my code. Not because i wrote simpler or easier code, but rather that i simply SKIPPED all of above optimizations. Every macro, every subtraction you did above has a reason separate from basic algorithm, that is why you wrote loads of comments explaining them.

My code is simply implementation of "Check if a number was not marked earlier in the array. If not, it is a prime. Then mark all multiples of it as non prime". However, i do agree that the reference i originally had in mind was of marking multiples and i insist that it is easier to understand for a beginner compared to your first odd prime step.

It is not necessary that every student reading this has learned bitwise operators before above (i have experienced this a lot on this forum). I don't like to make those a prerequisite for the sake of obtaining efficiency and speed specially when that is not a concern.

Algortihmic speed is always far superior to machine optimizations. And we are mainly concerned with algorithms here.

Lastly once again all of above is good, but i insist (is not part of basic algorithm). A student has to think about going up to only square root, leaving even indexes, using bitwise ops, curling and what not. To me all of this is intuitive follow up once you grasp the basic idea of algorithm. I would much prefer that 10 students learn and understand the basic working from above code and say 2 of them are keen enough to produce above optimizations on their own, rather than 8 of them leaving this page or not grasping the idea because of multiple things cluttered together on top of algo in the first place.

Writing professional or optimized code ONLY when needed is one of the things i have learned after many years of writing / maintaining software. It was in college that i used to enjoy writing A*=7 efficiently with a<<3-a and feel good about it. Not all but most of those (albeit not as simple as the example i just wrote) now are good candidates for obfuscated code contests.

Enough said.
Fayyaz
• 0
Today is the first day of the rest of my life

### #11 GordonBGoodness

GordonBGoodness
• Location:Thailand
• Programming Language:C, Java, C++, Objective-C, C#, (Visual) Basic, JavaScript, Delphi/Object Pascal, Visual Basic .NET, Pascal, Assembly, Fortran, VBScript, Others

Posted 29 May 2012 - 03:12 AM

In my opinion, YES the above is a lot harder for a newbie to understand compared to my code. Not because i wrote simpler or easier code, but rather that i simply SKIPPED all of above optimizations. Every macro, every subtraction you did above has a reason separate from basic algorithm, that is why you wrote loads of comments explaining them.

My code is simply implementation of "Check if a number was not marked earlier in the array. If not, it is a prime. Then mark all multiples of it as non prime". However, i do agree that the reference i originally had in mind was of marking multiples and i insist that it is easier to understand for a beginner compared to your first odd prime step.

It is not necessary that every student reading this has learned bitwise operators before above (i have experienced this a lot on this forum). I don't like to make those a prerequisite for the sake of obtaining efficiency and speed specially when that is not a concern.

Algortihmic speed is always far superior to machine optimizations. And we are mainly concerned with algorithms here.

Lastly once again all of above is good, but i insist (is not part of basic algorithm). A student has to think about going up to only square root, leaving even indexes, using bitwise ops, curling and what not. To me all of this is intuitive follow up once you grasp the basic idea of algorithm. I would much prefer that 10 students learn and understand the basic working from above code and say 2 of them are keen enough to produce above optimizations on their own, rather than 8 of them leaving this page or not grasping the idea because of multiple things cluttered together on top of algo in the first place.

Writing professional or optimized code ONLY when needed is one of the things i have learned after many years of writing / maintaining software. It was in college that i used to enjoy writing A*=7 efficiently with a<<3-a and feel good about it. Not all but most of those (albeit not as simple as the example i just wrote) now are good candidates for obfuscated code contests.

Enough said.
Fayyaz

First, I disagree that your implementation of "if it's prime then mark all multiples of that prime as not prime" where you determine those multiples by multiplication is the spirit of the Sieve of Eratosthenes, which was purposely formulated by Eratosthenes in order to avoid having to multiply, as in those days it was a lot more difficult than just writing p * c.

I agree that there was no need for me to use right shift for a divide by two, and in the single place I used it it wouldn't make any real difference anyway.

The above code was written to show it doesn't take much extra code to do the optimizations as suggested by the Wikipedia article, which actually states that Eratosthenes included the "sieve only for odd numbers" in his original algorithm; the slightly less efficient code that doesn't eliminate the even numbers could be written as follows:
```#define MAX_NUM 1000000

#define SQRT_LIM (unsigned int)(sqrt((double)MAX_NUM)) //square root of MAX_NUM rounded down

unsigned char primes[MAX_NUM + 1]; //make composite number array large enough to include maximum.

void gen_sieve_primes(void)
{
memset(primes, 1, MAX_NUM + 1); // marks all elements of the array as potential primes
primes = 0; primes = 0; //but zero and one are not prime

for (unsigned int prime = 2; prime <= SQRT_LIM; ++prime) // for all prime candidates starting at two to sqrt(max)
if (primes[prime] != 0) //if is still marked as prime
//you will find that the indexes of the composite numbers for a prime are exactly prime apart
//which Eratosthenese determined in order to avoid multiplication
//starting at the square of the prime composites to be culled
for (unsigned int composite = prime * prime; composite <= MAX_NUM; composite += prime)
primes[composite] = 0; //cull all prime composites to maximum limit of array
}
```

The comments are just so that newbies can follow the code and there aren't really any more than in your OP code other than to explain that multiplication to find prime composites is not necessary.

One comment on the size of the primes array is to correct an error in your code where the array is not big enough to enclose the maximum number and your OP code will be in error by one in the case where the maximum number to be tested is prime - it won't find it.

Now there aren't any index manipulations to eliminate the even numbers or "macro" subtractions and the code now says exactly as the Sieve of Eratosthenes states: "For a list (= array) of all numbers, starting at two cull all the multiples of that prime (= composites) from the array by indexing each by exactly prime counts starting at the square of the prime. Do this for every number still in the array, which will be prime, up to the square root of the maximum number."

I'll leave readers of your forum here to decide what version they like.

I'm out of here, Gordon
• 0

### #12 tpPacino

tpPacino
• Location:Sarajevo, Bosnia and Herzegovina
• Programming Language:C, Java, C++, PHP, (Visual) Basic, PL/SQL, Pascal, Others
• Learning:Objective-C, C#, JavaScript, Ruby, Haskell, Others

Posted 25 June 2012 - 03:40 PM

There are other simpler solutions... For example, the Sieve of Eratosthenes is good for numbers under 10 mil.
This piece of code calculates 1 mil. primes in 1.29 sec in a for loop.

```bool IsAPrime(int n)
{
for(int i(2); i<=sqrt(n); i++)
if(n%i == 0) return true;
return false;
}```

But this one does it under approximately 0.9 sec...

```int EratosthenesSieve (int upper, int array[])
{
/* the function takes the upper limit to which we test the numbers in the arguments, as the array in which we store the numbers,
but it returns the exact number of primes */
int size=0;
int i;
int sieve; // enough for 2000 primes....
memset(sieve,0,sizeof sieve);
int tmp;
for(int i(2); i<=upper; i++)
{
if(!sieve[i])
{
array[size++] = i;
tmp = i+i;
while(tmp<=upper)
{
sieve[tmp] = 1;
tmp+=i;
}
}
}
return size;
}```

Tested on Intel Dual Core 2.56 GHZ and 2GB RAM
There are better ways to do it, but these are the simplest...
• 0

### Also tagged with one or more of these keywords: nested loop

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download 