Softmax based Policy Gradient Reinforcement Learning Now Supported with CUDA 10!

In our latest release, version 0.10.0.24, we now support multi-threaded, SoftMax based Policy Gradient Reinforcement Learning as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 10.0.130/cuDNN 7.3.

Using the simple SoftMax based policy gradient reinforcement learning model shown below…

Simple Softmax based Policy Gradient RL Model for Cart-Pole

… the SignalPop AI Designer uses the MyCaffeTrainerRL to train the model to solve the Cart-Pole problem and balance the pole.

To use the MyCaffeTrainerRL, just set the custom_trainer Solver property to RL.Trainer and you are ready to go.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

CUDA 10.0.130/cuDNN 7.3/driver 411.63 support added.
Added SoftMax support to Policy Gradient Reinforcement Learning.
Added multi-threading support to Policy Gradient Reinforcement Learning (across GPU’s).

Bug Fixes

The following bug fixes have been made in this release.

Fixed bug Policy Gradient accumulation, speeding up learning by 3x!
Fixed bug in snapshots related to Policy Gradient learning.

[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.