We are interested here in factoring numbers that are a product of two very large primes. Such numbers are used by encryption algorithms such as RSA, and the prime factors represent the keys (public and private) of the encryption code. Here you will also learn how data science techniques are applied to big data, including visualization, to derive insights. This article is good reading for the data scientist in training, who might not necessarily have easy access to interesting data: here the dataset is the set of all real numbers — not just the integers — and it is readily available to anyone. Much of the analysis performed here is statistical in nature, and thus, of particular interest to data scientists.

Factoring numbers that are a product of two large primes allows you to test the strength (or weakness) of these encryption keys. It is believed that if the prime numbers in question are a few hundred binary digits long, factoring is nearly impossible: it would require years of computing power on distributed systems, to factor just one of these numbers.

While the vast majority of big numbers have some small factors and are thus easier to break, the integers that we are dealing …