Data Warehouse or Data Lake? When to Use Each

Article posted on : link to source

Most businesses today are consolidating data from multiple sources into a single customizable platform for big data analytics. Having a separate platform for data analytics lets you create dashboards to segment, aggregate, and analyse high-dimensional data and make low-latency queries to perform real-time analytics.

What platform should you use to power your data analytics machine? Data warehouses and data lakes are common two alternatives.

What’s a Data Warehouse?

A data warehouse is a central repository of data gathered from diverse sources like cloud-based applications and in-house repositories. Unlike the typical relational database, a data warehouse uses column-oriented storage known as a columnar database. Since the database stores data by columns rather than rows, it’s more suitable for data warehousing. Once you have a data warehouse set up and loaded with both current and historical data, people in the organisation can use it to create forecasting dashboards and trend reports using tools such as Looker, Chartio, Periscope Data, and Mode.

A data warehouse has the following characteristics:

Integrated – The way the data is cleansed and extracted is uniform regardless of the original source.
Non-volatile – Since the data in a data warehouse is periodically uploaded and not in real-time, any momentary change shouldn’t have much impact …

Read More on Datafloq