Policy Gradient Reinforcement Learning Now Supported!

In our latest release, version 0.9.2.188, we now support Policy Gradient Reinforcement Learning as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 9.2.148 (p1)/cuDNN 7.2.1.

For training, we have also added a new Gym infrastructure to the SignalPop AI Designer, where the dataset in each project can either be a standard dataset, or a dynamic gym dataset, such as the Cart-Pole gym inspired by OpenAI[4][5] (originally created by Richard Sutton et al. [6][7]).

Using a simple policy gradient reinforcement learning model shown below…

Simple Sigmoid based Policy Gradient RL Model for Cart-Pole

… the SignalPop AI Designer uses the new MyCaffeTrainerRL to train the model to solve the Cart-Pole problem and balance the pole.

To use the MyCaffeTrainerRL, just set the custom_trainer Solver property to RL.Trainer and you are ready to go.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For other cool example videos, check out our Examples page.

New Features

The following new features have been added to this release.

CUDA 9.2.148 (p1)/cuDNN 7.2.1/driver 399.07 support added.
Added support for Policy Gradient Reinforcement Learning via the new MyCaffeTrainerRL.
Added new Gym support via the new Gym dataset type along with the new Cart-Pole gym.
Added a new MemoryLoss layer.
Added a new SoftmaxCrossEntropyLoss layer.
Added a new LSTMSimple layer.
Added layer freezing to allow for each Transfer Learning.

Bug Fixes

The following bug fixes have been made in this release.

Fixed bug in LOAD_FROM_SERVICE (was not working)
Fixed bugs in Label Visualization.
Fixed bugs in Weight Visualization.
Fixed bugs related to Importing Weights.

[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.
[4] OpenAI, CartPole-V0.
[5] OpenAI, GitHub:gym/gym/envs/classic_control/cartpole.py, GitHub, April 27, 2016.
[6] Barto, A. G., Sutton, R. S., Anderson, C. W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE, Vols. SMC-13, no. 5, pp. 834-846, September 1983.
[7] Sutton, R. S., et al., incompleteideas.net/sutton/book/code/pole.c, 1983.