The problem with the data lake

Are data lakes fake news? The quick answer is yes and in this post, I will show you why.

The biggest problem I have with data lakes is that the term has been overloaded by vendors and analysts with meanings. Sometimes it seems that anything that does not fit into the traditional data warehouse architecture falls under the catch-all phrase of the data lake. The result is an ill-defined and blurry concept. We all know that blurred terminology leads to blurred thinking, which in turn leads to poor decisions.

I have come across various definitions for a data lake. We will discuss all of them in this post. Sometimes people only refer to one of these ideas when talking about a data lake other times they mix and match these concepts. Some people mean all of the things below when they refer to a data lake. Others are more selective.

The Data Lake as a Raw Data Reservoir

This is the original meaning of a data lake. In this definition, the data lake is not too dissimilar to a staging area in a data warehouse. In a staging area, we make a copy of the …

