MLPs (Multi-Layer Perceptrons) are great for many classification and regression tasks, but it is hard for MLPs to do classification and regression on sequences. In this code tutorial, a GRU is implemented in TensorFlow.
A sequence is an ordered set of items and sequences appear everywhere. In the stock market, the closing price is a sequence. Here, time is the ordering. In sentences, words follow a certain ordering. Therefore, sentences can be viewed as sequences. A gigantic MLP could learn parameters based on sequences, but this would be infeasible in terms of computation time. The family of Recurrent Neural Networks (RNNs) solve this by specifying hidden states which do not only depend on the input, but also on the previous hidden state. GRUs are one of the simplest RNNs. Vanilla RNNs are even simpler, but these models suffer from the Vanishing Gradient problem.
The key idea of GRUs is that the gradient chains do not vanish due to the length of sequences and this is done by allowing the model to pass values completely through the cells. The model is defined as the following :
I had a hard time understanding this model, but it turns out that it is not too hard …