Recurrent Learning Now Supported with cuDNN 7.4.1 on Char-RNN to learn Shakespeare!

In our latest release, version 0.10.0.122, we now support Recurrent Learning with both the LSTM [1] and LSTM_SIMPLE [2] layers to solve the Char-RNN as described by [3] and inspired by adepierre [4] and create a Shakespeare sonnet, and do so with the recently released CUDA 10.0.130/cuDNN 7.4.1.

To create the Shakespeare above, we used the Char-RNN model shown below with the LSTM Layer.

Char-RNN LSTM based Model

To solve this model, the SignalPop AI Designer uses the new MyCaffeTrainerRNN (which operates similarly to the MyCaffeTrainerRL) to train the recurrent network.

But what actually happens within each LSTM layer to make this happen?

Each LSTM layer contains an Unrolled Net which unrolls the recurrent operations.  As shown below, you can see that each of the 75 elements within the sequence are ‘unrolled’ into a set of layers: SCALE, INNER_PRODUCT, RESHAPE and LSTM_UNIT.  Each set process the data and feeds it into the next thus forming the recurrent nature of the network.

LSTM Unrolled Net

During the initialization (steps 1-3) the weights are loaded and data is fed into the input blob.  Next in steps 4-8, the recurrent nature of the network processes the each item within the sequence. Upon reaching the end the results are concatenated to form the output in steps 9 and 10.  This same process occurs in both LSTM layers – so the full Char-RNN actually has over 600 layers.  Amazingly all of this processing for both the forward and backward pass happens in around 64 milliseconds on a Titan Xp running in TCC mode!

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

  • RTX 2080ti now supported!
  • CUDA 10.0.130/cuDNN 7.4.1/nvapi 410/driver 416.94.
  • Native Caffe updated through 10/24/2018.
  • Added new ClipLayer.
  • Added new LSTMLayer.
  • Added new RNNLayer.
  • Added new InputLayer.
  • Added new ReshapeLayer.
  • Added new MyCaffeGymTrainerRNN for RNN Training.
  • Added new Output Window for MyCaffeGymTrainerRNN output.
  • Added support for Char-RNN networks used on text files.
  • Added support for Char-RNN style networks used on audio WAV files.
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bugs in ATARI negative results not showing up in Project window.
  • Fixed bugs in running Test on gyms.

 


[1] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell, (2014), Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Arxiv.
[2] junhyukoh, (2016), junkyukoh/caffe-lstm Github, GitHub.com.
[3] Karpathy, A., (2015), The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy blog.
[4] adepierre, (2017), adepierre/caffe-char-rnn Github, GitHub.com.