Preparing Your Dataset for Machine Learning: 8 Basic Techniques That Make Your Data Better

Article posted on : link to source

There’s a good story about bad data told by Martin Goodson, a data science consultant. A healthcare project was aimed to cut costs in the treatment of patients with pneumonia. It employed machine learning (ML) to automatically sort through patient records to decide who has the lowest death risk and should take antibiotics at home and who’s at high risk of death from pneumonia and should be in the hospital. The team used historical data from clinics, and the algorithm was accurate.

But there was an important exception. One of the most dangerous conditions that may accompany pneumonia is asthma, and doctors always send asthmatics to intensive care resulting in minimal death rates for these patients. So, the absence of asthmatic death cases in the data made the algorithm assume that asthma isn’t that dangerous during pneumonia, and in all cases, the machine recommended sending asthmatics home, while they had the highest risk of pneumonia complications.

ML depends heavily on data. It’s the most crucial aspect that makes algorithm training possible and explains why machine learning became so popular in recent years. But regardless of your actual terabytes of information and data science expertise, if you can’t make sense of data records, a …

Read More on Datafloq

%d bloggers like this: