AlphaGo is one of our favorite topics to write about — not only are games of strategy interesting in and of themselves, but pitting human masters against a computer in the most complex board game we’ve yet invented is thrilling stuff. Part of what made AlphaGo so interesting is the way the computer came to win; or more precisely, how it learned the game and then formulated a winning strategy. For the original AlphaGo, it was fed a massive trove of previous games’ data, and then it was able to eventually formulate winning strategies through extrapolation from those previous games’ data. But then came AlphaGo Zero, which harnessed a novel approach to machine learning called reinforcement learning; that allowed it to not only dominate human players but to dominate the original incarnation of AlphaGo as well.
Reinforcement learning is different from some other types of machine learning in that it’s primarily focused on how a software agent out to take actions within a given environment so as to maximize some notion of desirable reward. In other words, it’s playing a game and trying to win it by refining its choices over time, to the point where it can calculate and anticipate winning strategies based on both prior data as well as future models/predictions.
For AlphaGo, it was learning …