Softmax based Policy Gradient Reinforcement Learning Now Supported with CUDA 10!

In our latest release, version 0.10.0.24, we now support multi-threaded, SoftMax based Policy Gradient Reinforcement Learning as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 10.0.130/cuDNN 7.3.

Using the simple SoftMax based policy gradient reinforcement learning model shown below…

Simple Softmax based Policy Gradient RL Model for Cart-Pole

… the SignalPop AI Designer uses the MyCaffeTrainerRL to train the model to solve the Cart-Pole problem and balance the pole.

To use the MyCaffeTrainerRL, just set the custom_trainer Solver property to RL.Trainer and you are ready to go.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

  • CUDA 10.0.130/cuDNN 7.3/driver 411.63 support added.
  • Added SoftMax support to Policy Gradient Reinforcement Learning.
  • Added multi-threading support to Policy Gradient Reinforcement Learning (across GPU’s).
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bug Policy Gradient accumulation, speeding up learning by 3x!
  • Fixed bug in snapshots related to Policy Gradient learning.


[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.