How Can You Tackle the Dirty Data Problem

“Big data” is an enormously popular concept and at the first glance, it does sound like a silver bullet. Whatever your business deals with, whatever your problem is, for that matter, all you have to do is collect the sufficient amount of data, plug it all in the computer and wait for insights. The long-awaited answers will pour out and tell you how to increase your sales, what demographics to target, and how to make your employees more loyal and efficient.

However, the reality is more complicated than that. Often information is incomplete, therefore misleading. Moreover, data has a bad habit to change constantly. People move homes and change their jobs and phone numbers, start families, gain or lose weight, dye hair, start or quit shaving, develop new medical conditions and eating habits. Your data is decaying by the hour. It is important that you do regular revision and updates of your databases. Cleaning up the “dirty” data is an important and often overlooked task, necessary to prevent costly mistakes.

What do you mean “dirty”?

Let’s make it clear: there are no “clean” data sets. The life is complicated, messy and full of “white noise” and irrelevant facts. Even the most “tidy” and …

