**Find all prime numbers less than 1 million**

Generating large prime numbers or even testing huge numbers for being prime or not has been a daunting task.

To clarify the basics, a number is called prime if it can be divided by only 1 and itself and no other. This means a prime can't have more than two divisors. For e.g. 2,3,5,7 etc. are all primes.

A general or common approach to test if a number is prime or not starts with testing from 2 onwards up to half of the number trivially, though actually up to square root is sufficient.

This means that if given a number e.g. 9999 whose square root is roughly 99.99 (obviously not perfect otherwise it is already not a prime), we can test if any number between 2 and 99 divides it completely. If not than no other greater number would divide either and hence this would be a prime.

Now in practice if being asked to generate say the first 1000 prime numbers, it is going to be very in efficient that you start from 2 and keep going in a loop. Then for each value you take the numberâ€™s square root and check if it divides any number between 2 and square root. If not, it is a prime. This will be a O(n)^2 algorithm.

There has been a well known efficient algorithm for doing the above task with two constraints:

1. An upper limit needs to be known in advance i.e. 1, 10 million etc.

2. One should be able to allocate array of that size in a computer program.

The idea is basically that you create an array of the given size (1 million) initialized with all 0â€™s. Now start with number 2 at index 2 of array. Since the value at this index is 0, mark it as a prime. Then run a loop up to 1 million marking all multiples of 2 in array as non primes.

Then keep incrementing after 2 up to a million. If the current index is found to be 0 or unmarked, it means it is not divisible by any of the numbers we have encountered so far. Otherwise it would have been marked as non prime and wouldnâ€™t be zero.

This is known as Sieve of Eratosthenes and is considered a very efficient algorithm for finding primes. Though it still runs in two nested loops, but the performance gains are highly significant and itâ€™s run time complexity amounts to be O(n(logn)(loglogn)). For details see

Sieve of Eratosthenes - Wikipedia, the free encyclopedia

I have used a separate print function which takes a little fancy formatting for 1st 2nd and 3rd primes into account. The core function is gen_sieve_primes which runs two nested loops. The outer loop runs from 2 to max number. If value at current index is 0 meaning unmarked, it marks that as a prime by assigning 1. Then it proceeds to second loop in which it marks all multiples of current number (for 2, those would be 4,6,8, 10â€¦. Up to 1 million) as non primes by assigning them -1.

#define MAX_NUM 1000000 // 200 // array will be initialized to 0 being global int primes[MAX_NUM]; void gen_sieve_primes(void) { for (int p=2; p<MAX_NUM; p++) // for all elements in array { if(primes[p] == 0) // it is not multiple of any other prime primes[p] = 1; // mark it as prime // mark all multiples of prime selected above as non primes int c=2; int mul = p * c; for(; mul < MAX_NUM;) { primes[mul] = -1; c++; mul = p*c; } } } void print_all_primes() { int c = 0; for(int i=0; i<MAX_NUM; i++) { if(primes[i] == 1) { c++; if(c < 4) { switch(c){ case 1: cout << c << "st prime is: " << i << endl; break; case 2: cout << c << "nd prime is: " << i << endl; break; case 3: cout << c << "rd prime is: " << i << endl; break; default: break; } } else cout << c << "th prime is: " << i << endl; } } } int main() { gen_sieve_primes(); print_all_primes(); return 0; }

Following is the run of above algorithms using 200 as the max value.

Then I ran this on 1 million numbers size array using Visual Studio on Vista 32 bit machine. The reason for mentioning this is, you may or may not be able to create an array as large depending upon the available memory and type of OS. On a 64 bit machine, one might be able to go even further.

It took roughly a couple of minutes to show all primes less than a million. Following is the end of prints showing that there are some 78,498 primes in 1 million numbers.

There can be a few optimizations to improve this algorithm as suggested in the link I referred above. For e.g. instead of incrementing by 1 while going from 2 to 1 million in the outer loop, we can treat 2 as a special case, start from 3 and do increments of 2. This is because we know no even number can be a prime and all of those would be marked while running the loop for 2.

From space perspective, instead of an integer which might be 32bit or even 64 bit on some machines, we could use a one byte variable i.e. int8_t for array, since we only need 3 states (unexplored, prime, and not prime) represented by 0, 1 and -1 in the current program. This is going to reduce the amount of memory required to between 1/4th to 1/8th.