Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice

While shaping the idea of your data science project, you probably dreamed of writing variants of algorithms, estimating model performance on training data, and discussing prediction results with colleagues . . . But before you live the dream, you not only have to get the right data, you also must check if it’s labeled according to your task. Even if you don’t need to collect specific data, you can spend a good chunk of time looking for a dataset that will work best for the project.

Thousands of public datasets on different topics — from top fitness trends and beer recipes to pesticide poisoning rates — are available online. To spend less time on the search for the right dataset, you must know where to look for it.

This article is aimed at helping you find the best publicly available dataset for your machine learning project. We’ve grouped the article sections according to dataset sources, types, and a number of topics:

Catalogs of data portals and aggregators
Government and official data
Scientific research data
Verified datasets from data science communities
Political and social datasets from media outlets
Finance and economic data
Healthcare data
Travel and transportation data
Other sources

So, let’s deep dive into this ocean of data.

Catalogs of data portals and aggregators

While you …

