Noisy-Net with Deep Q-Learning Now Supported with cuDNN 7.6.1!

In our latest release, version 0.10.1.169, we now support Noisy-Nets as described by Fortunato et al.[1], Prioritized Experience Replay as described by Schaul et al.[2] and Deep Q-Learning reinforcement learning as described by Castro et al.[3] and The Dopamine Team[4] and use these to train MyCaffe to beat the ATARI game ‘Breakout’!

The action values are displayed in an overlay at the bottom of the ATARI Gym and updated during each step of the game.

The Noisy-Net model used with reinforcement learning is fairly simple, comprising of three CONVOLUTION layers followed by two INNERPRODUCT layers each of which have noise turned ON.

Deep Q-Learning Noisy-Net model for ATARI breakout.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release

  •  CUDA 10.1.168/cuDNN 7.6.1/nvapi 410/driver 430.86 support added.
  • Windows 1903 support added.
  • Added Test support to RL trainers.
  • Added TestMany support to RL trainers.
  • Added DQN trainer support for Deep Q-Learning.
  • Added ATARI breakout ROM.
  • Added Noise support to InnerProduct layers for NoisyNets.
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bugs in MemoryLoss layer.
  • Fixed bugs in Convolution Editor, pad = 0 now accepted.

Happy Deep Learning!



[1] Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg, Noisy Networks for Exploration, arXiv:1706.10295, June 30, 2017.
[2] Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, arXiv:1511.05952, November 18, 2015.
[3] Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare, Dopamine: A Research Framework for Deep Reinforcement Learning, arXiv:1812.06110, December 14, 2018.
[4] The Dopamine Team (Google), GitHub:Google/Dopamine, GitHub, Licensed under the Apache 2.0 License. Source code available on GitHub at google/dopamine.
[5] The Arcade Learning Environment: An Evaluation Platform for General Agents, by Marc G. Bellemare, Yavar Naddaf, Joel Veness and Michael Bowling, 2012-2013. Source code available on GitHub at mgbellemare/Arcade-Learning-Environment.
[6] Stella – A multi-platform Atari 2600 VCS emulator by Bradford W. Mott, Stephen Anthony and The Stella Team, 1995-2018. Source code available on GitHub at stella-emu/stella

Dual RNN/RL Trainer with CUDA 10.1.168 / cuDNN 7.6 Now Supported!

In our latest release, version 0.10.1.145, we have added support for a new Dual Recurrent/Reinforcement Learning Trainer that also supports the newly released NVIDIA CUDA 10.1.168 (Update 1) with cuDNN 7.6. With the dual RNN/RL trainer you can train the same model first in an recurrent style learning and then train the same model with a second training pass that uses a reinforcement learning style of learning. In addition, this release includes dramatic optimizations that greatly improve the responsiveness of the application when opening and closing projects.

New Features

The following new features have been added to this release.

  • Dramatically improved application start-up time.
  • Dramatically improved application responsiveness to project actions (open, close, etc.)
  • Dramatically improved application responsiveness when changing project properties.
  • Added option to save image in all Image Viewer dialogs.
  • Added dual RNN/RL training model support with stages to distinguish between each during training.
  • New setting allows changing the maximum debug data/diff displayed.
  • Editor now automatically shows RNN or RL stage (if used).
  • Added new Finance Gym with simulated price stream.
  • Added ability to debug data of specific items on data or data link.
  • Added __half sized memory support to CONVOLUTION, POOLING, TANH, SIGMOID, RELU and INPUT layers.
  • Moved minimum compute from 3.5 to 5.3 (Maxwell) for half memory support.

The following bugs have been fixed in this release.

  • LSTM layer output now shows values in debug data window instead of all zeros.
  • Dummy Data Layer shape now supported in the editor.
  • Google Dream Evaluator now shows octaves properly.
  • Image Evaluator now shows debug layers properly.
  • Snapshot Update, Snapshot Load Method and Image Load Method now saved.
  • Custom trainers now properly take snapshots.
  • Large images in image viewer are no longer skewed and support scrolling.

Check out our Tutorials for easy step-by-step instructions that will get you started quickly with several complex model types! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.

Maintenance Release with CUDA 10.1/cuDNN 7.5 Support.

In our latest maintenance release, version 0.10.1.48, we have fixed numerous bugs and support the newly released NVIDIA CUDA 10.1 with cuDNN 7.5. You can now use Neural Style Transfer, Policy Gradient based reinforcement learning, Char-RNN LSTM based learning and much more with CUDA 10.1.

New Features

The following new features have been added to this release.

  • Added original image and final image sizing to Neural Style Evaluator.
  • Added update status when deleting datasets.
  • Updated SimpleGraphing with minor updates.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs related to debug window updating very slowly.
  • Fixed bugs related to layer inspection which now shows the best results correctly.
  • Fixed bugs related to extremely slow training after 10k iterations.
  • Fixed bugs related to iterations not matching Project Graph x values.
  • Fixed bugs related to Project Control user interface during training.
  • Fixed bug related to Resource Window not updating after dataset changes.
  • Fixed bugs related to slow LSTM_SIMPLE training.

Check out our Tutorials for easy step-by-step instructions that will get you started quickly with several complex model types! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.

CUDA 10.1 / cuDNN 7.5 Now Supported!

In our latest release, version 0.10.1.21, we have added support for the newly released NVIDIA CUDA 10.1 with cuDNN 7.5. You can now use Neural Style Transfer, Policy Gradient based reinforcement learning, Char-RNN LSTM based learning and much more with CUDA 10.1.

New Features

The following new features have been added to this release.

  • Added image blending to Neural Style Transfer.
  • Added new CSV Dataset Creator.
  • Updated SimpleGraphing with minor updates.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs related to deleted dataset causing project load crash.
  • Fixed bugs related to GPF caused when debugging data or diff link.

Check out our Tutorials for easy step-by-step instructions that will get you started quickly with several complex model types! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.

Neural Style Transfer now supported with cuDNN 7.4.2!

In our latest release, version 0.10.0.190, we have added Neural Style Transfer as described by [1] using the VGG model [2].

Neural Style Transfer

With Neural Style Transfer, the style of one photo (such as Vincent Van Gogh’s Starry Night) is learned by the AI model and applied to a content photo (such as the photo of train tracks) to produce a new art piece!

For example, selecting Edvard Munch’s The Scream as a style tells the application to learn the style of his painting and apply it to the content photo to produce a new and different art piece from the one created with Van Gogh’s style.

Edvard Munch’s The Scream

In another example, using Claude Monet’s Water Lilies 1916 as the style creates another new art piece.

Claude Monet’s Water Lilies 1916

How Does This Work?

As shown above, the neural style algorithm learns the style and applies it to the content, but how does this work?  When running a neural style transfer, the model used (e.g. VGG19) is augmented by adding a few new layers that allow for learning the style and then back-propagating both the style and content features back to create the final image.  As described by Li [3], the Gram matrix is the main work-horse used to learn the style in that it, in theory, “is equivalent to minimizing the Maximum Mean Discrepancy” and shows that that neural style transfer is “a process of distribution alignment of the neural activations between images.”  The Gram matrix is implemented by the GRAM layer.  Two additional layers play important roles in the process.  The EVENT layer is used to only apply the gradients that matter (e.g. zeroing out the others) and the PARAMETER layer holds the final learned results that produce the end result – a neural ‘stylized’ image.

To show how this works a little better, the model below displays a visualization of the actual augmented model (augmented from VGG19 and originally inspired by [4]) used for the neural style transfer.  With this example, the layers used to learn the style are: conv1_1, conv2_1, conv3_1, conv4_1 and conv5_1.  And, the layer used to learn the content is: conv4_2.  You can use whatever layers you want to create different output effects, but these are the ones used in this discussion.

Neural Style Transfer Model

During the neural style transfer process, the following steps take place.

  1. First, the style image is loaded into the data blob and a forward pass is performed on the augmented VGG19 model with GRAM layers already added.
  2. Next, the GRAM layer output blobs gram1_1, gram2_1, gram3_1, gram4_1 and gram5_1 are saved.
  3. The content image is then loaded into the data blob and another forward pass is performed on the augmented model.
  4. This time, the output blob of the conv4_2 CONVOLUTION output layer is saved.
  5. The model is further augmented by adding INPUT, EVENT and EUCLIDEAN_LOSS layers to match each GRAM layer.  A SCALAR layer is added to the content layer to help data scaling, and weight learning is disabled on the CONVOLUTION layers.
  6. Next, the style output blobs saved in steps #2 above are restored back into the solver model, augmented in step #5 above.
  7. And the content output blob saved in step #4 above is restored back into the augmented solver model as well.
  8. The solver is then directed to solve the model which begins the forward/backward set of iterations to solve the neural style transfer.
  9. On each forward pass, the Euclidean loss is calculated between the style INPUT and GRAM layers (EVENT layer acts as a passthrough on the forward pass).
  10. During the backward pass, the gradient calculated in step #9 is passed back through the EVENT layer which then applies only the gradients that have actually changed and zeros out the rest.
  11. The same process occurs with the content layer, but in this case the Euclidean loss is calculated between the content INPUT and the CONVOLUTION layer conv4_2 output.
  12. Again the gradient calculated in step #11 is passed back through an EVENT layer which then applies only the gradients that have actually changed and zeros out the rest.

The back propagation continues on up the network and deposits the final changes into the PARAMETER layer input1.  After running numerous iterations (sometimes just around 200), the style of the style image paints the content provided by the content image to produce a new piece of art!

New Features

The following new features have been added to this release.

  • Added new Neural Style Transfer Evaluator.
  • Added Database Optimizer.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs in Import Project dialog including slow processing.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.


[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, A Neural Algorithm of Artistic Style, 2015, arXiv:1508:06576.

[2] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014, arXiv:1409.1556.

[3] Yanghao Li, Naiyan Wang, Jiaying Liu and Xiaodi Hou, Demystifying Neural Style Transfer, 2017, arXiv:1701.01036.

[4] ftokarev, GitHub:ftokarev/caffe-neural-style, 2017, GitHub.

The ‘Train Bridge‘ photo is provided by www.pexels.com under the CC0 License.

Each of the paintings, Starry Night by Vincent Van Gogh, The Scream by Edvund Munch and Water Lilies 1916 by Claude Monet were are all provided as public domain by commons.wikimedia.org.

cuDNN LSTM Engine Now Supported to learn Shakespeare 5x faster!

In our latest release, version 0.10.0.140, we have added CUDNN engine support to the LSTM layer to solve the Char-RNN 5x faster than when using the CAFFE engine. As described in our last post, the CAFFE version (originally created by Donahue et al. [1]) uses an internal Unrolled Net to implement the recurrent nature of the learning. The cuDNN LSTM implementation [2] further accelerates the LSTM algorithm [3][4] by combining multiple LSTM/Dropout layered operations into a single layer, dramatically speeding up the model load time (67% faster), the model training (5x faster!) and reducing the memory use (33% less) from the CAFFE engine version.

When using the Char-RNN with the CUDNN engine based LSTM Layer, the model looks as follows.

Char-RNN Model with LSTM and CUDNN engine

Using the CUDNN version of the LSTM layer dramatically simplifies the model, but does not reduce the functionality nor internal complexity. In fact, by moving the unrolled operations down to the cuDNN level, the model runs much more efficiently for cuDNN more effectively uses the GPU.

Visually comparing the two implementations gives a better idea of the amount of work actually being performed by cuDNN when running the LSTM for training and testing. The ‘lstm1’ layer on the left (that uses the CUDNN engine), replaces all of the operations taking place in the ‘lstm1’, ‘lstm2’ LSTM and ‘Drop1’ DROPOUT nodes on the right (that use the CAFFE engine) — and this includes replacing the two Unrolled Net’s used by each of the LSTM (CAFFE engine) layers!

Comparing cuDNN LSTM to Caffe LSTM

The amount of work performed by the cuDNN version of LSTM is also seen when comparing the amount of work performed by the GPU when training each model. Using the free SignalPop Universal Miner, we can visually see how hard each GPU is working while training both models simultaneously on two separate NVIDIA Titan Xp GPUs (both running in TCC mode). The cuDNN version of LSTM provides a more efficient use of the GPU by using over twice as much of the GPU processing capacity as the Caffe version – and not surprisingly also generates more heat.

cuDNN LSTM vs Caffe LSTM GPU Usage

Given that the cuDNN version of LSTM can generate a lot of heat, we have found the dynamic cooling control, provided by the SignalPop Universal Miner, to be helpful in maintaining cooler GPU temperatures while training.

cuDNN LSTM vs Caffe LSTM Training

During simultaneous training, both models appear to eventually train to the same point, albeit the LSTM CUDNN version just gets there five times faster!

When generating the AI based Shakespeare-like text, the end results are similar whether using either the CAFFE or CUDNN engine with the LSTM layer.

And I will be not to show them on
The day of the pracious for a thing and thee,
To come to me to the countery.

COSTARD:
Not prosencition, thou wert to the mean
That can we are the kinst of his lips,
And that I may hear you dear the trueching
To see the world of the commorrions
And strang to see the gentleman in the searth,
That fellow all the service of the moon
in eyes of the words.

CLAUDIO:
Not come as the emporious for the thing.

ANTALLO:
I am a will in the court of your man.

DING VINCRAN:
I have say the lead of the reason of Antony,
That he shall be dending and the seatent state
Which protention thou that not with beart.

SING HENRY VI:
I have sure a sour in thy parting indeed,
And the king with a deadness as to my too
And the never lady of the more that heart to the storn
And so to exferition:
I am not for a presposed to the meeter,
That you shall be the speephed with her ball
such of my earth of more with his dead.
New Features

The following new features have been added to this release.

  • Added EMBED layer tool-tips to the model editor.
  • Added RESHAP layer tool-tips to the model editor.
  • Added LSTM layer tool-tips to the model editor.
  • Added LSTM_SIMPLE layer tool-tips to the model editor.
  • Added cuDNN engine support to the LSTM layer.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug in the model editor related to incorrectly reporting input/output sizes for various layers.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.


[1] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell, (2014), Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Cornell University Library, arXiv 1411.4389.
[2] NVIDIA Corporation, Long Short-Term Memory (LSTM), NVIDIA.
[3] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, (2015), LSTM: A Search Space Odyssey, Cornell University Library, arXiv 1503.04069
[4] A. Sherstinsky, (2018), Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network, Cornell University Library, arXiv 1808.03314.

Recurrent Learning Now Supported with cuDNN 7.4.1 on Char-RNN to learn Shakespeare!

In our latest release, version 0.10.0.122, we now support Recurrent Learning with both the LSTM [1] and LSTM_SIMPLE [2] layers to solve the Char-RNN as described by [3] and inspired by adepierre [4] and create a Shakespeare sonnet, and do so with the recently released CUDA 10.0.130/cuDNN 7.4.1.

The thought of his but is the queen of the wind:
Thou hast done that with a wife of bate are to the
earth, and straker'd them secured of my own to with the
more.

CORIOLANUS:
My lord, so think'st thou to a play be with thee,
And mine with him to me, which think it be the gives
That see the componted heart of the more times,
And they in the farswer with the season
That thou art that thou hast a man as belied.

PRONEES:
That what so like the heart to the adficer to
the father doth and some part of our house of the server.

DOMIONA:
What wishes my servant words, and so dose strack,
here hores ip a lord.

PARELLO:
And you are grace; and a singer of your life,
And his heart mistress fare the dear be readors
To the buse of the father him to the sone.

HOMITIUS ENOBARY:
And they are not a his wonders for thy greater;
But but a plotering pastice and every sirs.

PAPOLLES:
I will not as my lord, and the prince and house,
But that is scort to my wanter with her than.

To create the Shakespeare above, we used the Char-RNN model shown below with the LSTM Layer.

Char-RNN LSTM based Model

To solve this model, the SignalPop AI Designer uses the new MyCaffeTrainerRNN (which operates similarly to the MyCaffeTrainerRL) to train the recurrent network.

But what actually happens within each LSTM layer to make this happen?

Each LSTM layer contains an Unrolled Net which unrolls the recurrent operations.  As shown below, you can see that each of the 75 elements within the sequence are ‘unrolled’ into a set of layers: SCALE, INNER_PRODUCT, RESHAPE and LSTM_UNIT.  Each set process the data and feeds it into the next thus forming the recurrent nature of the network.

LSTM Unrolled Net

During the initialization (steps 1-3) the weights are loaded and data is fed into the input blob.  Next in steps 4-8, the recurrent nature of the network processes the each item within the sequence. Upon reaching the end the results are concatenated to form the output in steps 9 and 10.  This same process occurs in both LSTM layers – so the full Char-RNN actually has over 600 layers.  Amazingly all of this processing for both the forward and backward pass happens in around 64 milliseconds on a Titan Xp running in TCC mode!

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

  • RTX 2080ti now supported!
  • CUDA 10.0.130/cuDNN 7.4.1/nvapi 410/driver 416.94.
  • Native Caffe updated through 10/24/2018.
  • Added new ClipLayer.
  • Added new LSTMLayer.
  • Added new RNNLayer.
  • Added new InputLayer.
  • Added new ReshapeLayer.
  • Added new MyCaffeGymTrainerRNN for RNN Training.
  • Added new Output Window for MyCaffeGymTrainerRNN output.
  • Added support for Char-RNN networks used on text files.
  • Added support for Char-RNN style networks used on audio WAV files.
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bugs in ATARI negative results not showing up in Project window.
  • Fixed bugs in running Test on gyms.

 


[1] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell, (2014), Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Arxiv.
[2] junhyukoh, (2016), junkyukoh/caffe-lstm Github, GitHub.com.
[3] Karpathy, A., (2015), The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy blog.
[4] adepierre, (2017), adepierre/caffe-char-rnn Github, GitHub.com.

Policy Gradient Reinforcement Learning Now Supported with cuDNN 7.3.1 on an ATARI Gym!

In our latest release, version 0.10.0.76, we now support multi-threaded, Policy Gradient Reinforcement Learning on the Arcade-Learning-Environment [4] (based on the ATARI 2600 emulator [5]) as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 10.0.130/cuDNN 7.3.1.

Using the simple Sigmoid based policy gradient reinforcement learning model shown below…

Simple Sigmoid based Policy Gradient RL Model for ATARI

… the SignalPop AI Designer uses the MyCaffeTrainerRL to train the model to play ATARI Pong better than ATARI!

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

  • CUDA 10.0.130/cuDNN 7.3.1/driver 411.70 and 416.16 support added.
  • Added new ATARI Gym support.
  • Added random exploration support.
  • Added ability to turn on/off policy gradient accelerated learning.
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bugs related to accumulating gradients and models with null diffs.
  • Fixed bugs in discounted return calculation.
  • Fixed bugs in Blob.NormalizeData.
  • Fixed bugs in ATARI results not showing in Project Window (0.10.0.76)
  • Fixed bugs in running Test on gyms (0.10.0.76)
New Publications

 



[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.
[4] The Arcade Learning Environment: An Evaluation Platform for General Agents, by Marc G. Bellemare, Yavar Naddaf, Joel Veness and Michael Bowling, 2012-2013. Source code available on GitHub at mgbellemare/Arcade-Learning-Environment.
[5] Stella – A multi-platform Atari 2600 VCS emulator by Bradford W. Mott, Stephen Anthony and The Stella Team, 1995-2018. Source code available on GitHub at stella-emu/stella

Softmax based Policy Gradient Reinforcement Learning Now Supported with CUDA 10!

In our latest release, version 0.10.0.24, we now support multi-threaded, SoftMax based Policy Gradient Reinforcement Learning as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 10.0.130/cuDNN 7.3.

Using the simple SoftMax based policy gradient reinforcement learning model shown below…

Simple Softmax based Policy Gradient RL Model for Cart-Pole

… the SignalPop AI Designer uses the MyCaffeTrainerRL to train the model to solve the Cart-Pole problem and balance the pole.

To use the MyCaffeTrainerRL, just set the custom_trainer Solver property to RL.Trainer and you are ready to go.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including a Cart-Pole balancing video, check out our Examples page.

New Features

The following new features have been added to this release.

  • CUDA 10.0.130/cuDNN 7.3/driver 411.63 support added.
  • Added SoftMax support to Policy Gradient Reinforcement Learning.
  • Added multi-threading support to Policy Gradient Reinforcement Learning (across GPU’s).
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bug Policy Gradient accumulation, speeding up learning by 3x!
  • Fixed bug in snapshots related to Policy Gradient learning.


[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.

Policy Gradient Reinforcement Learning Now Supported!

In our latest release, version 0.9.2.188, we now support Policy Gradient Reinforcement Learning as described by Andrej Karpathy[1][2][3], and do so with the recently released CUDA 9.2.148 (p1)/cuDNN 7.2.1.

For training, we have also added a new Gym infrastructure to the SignalPop AI Designer, where the dataset in each project can either be a standard dataset, or a dynamic gym dataset, such as the Cart-Pole gym inspired by OpenAI[4][5] (originally created by Richard Sutton et al. [6][7]).

Using a simple policy gradient reinforcement learning model shown below…

Simple Sigmoid based Policy Gradient RL Model for Cart-Pole

… the SignalPop AI Designer uses the new MyCaffeTrainerRL to train the model to solve the Cart-Pole problem and balance the pole.

To use the MyCaffeTrainerRL, just set the custom_trainer Solver property to RL.Trainer and you are ready to go.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly!  For other cool example videos, check out our Examples page.

New Features

The following new features have been added to this release.

  • CUDA 9.2.148 (p1)/cuDNN 7.2.1/driver 399.07 support added.
  • Added support for Policy Gradient Reinforcement Learning via the new MyCaffeTrainerRL.
  • Added new Gym support via the new Gym dataset type along with the new Cart-Pole gym.
  • Added a new MemoryLoss layer.
  • Added a new SoftmaxCrossEntropyLoss layer.
  • Added a new LSTMSimple layer.
  • Added layer freezing to allow for each Transfer Learning.
Bug Fixes

The following bug fixes have been made in this release.

  • Fixed bug in LOAD_FROM_SERVICE (was not working)
  • Fixed bugs in Label Visualization.
  • Fixed bugs in Weight Visualization.
  • Fixed bugs related to Importing Weights.


[1] Karpathy, A., Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy blog, May 31, 2016.
[2] Karpathy, A., GitHub:karpathy/pg-pong.py, GitHub, 2016.
[3] Karpathy, A., CS231n Convolutional Neural Networks for Visual Recognition, Stanford University.
[4] OpenAI, CartPole-V0.
[5] OpenAI, GitHub:gym/gym/envs/classic_control/cartpole.py, GitHub, April 27, 2016.
[6] Barto, A. G., Sutton, R. S., Anderson, C. W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE, Vols. SMC-13, no. 5, pp. 834-846, September 1983.
[7] Sutton, R. S., et al., incompleteideas.net/sutton/book/code/pole.c, 1983.