The concept of a blockchain is quite a phenomenon in recent times. It has quickly risen from a relatively obscure idea known mostly within some small circles to one that is being discussed as having potential to literally change some of the fundamentals of the world’s economic systems.
I don’t claim to be a blockchain expert, but given that it is a new paradigm for generating and storing data, my mind has naturally drifted toward thinking about how the mechanics and performance of analyzing data within a blockchain environment would be different from how we analyze data within other platforms. My initial thoughts point to some significant challenges.
A System That Isn’t Built for Analytics Isn’t Optimized for Analytics
Let’s start with a historical perspective by examining the early days of data warehousing and 3rd normal form data. Storing data in 3rd normal form does have a range of benefits, particularly when it comes to storing massive amounts of data at an enterprise scale. For one example, it minimizes data duplication and, therefore, storage costs. However, for building models and executing deeper analytics, we need to denormalize such data. So, 3rd normal form adds overhead to our analytic processing. There are benefits to …