Sequence-to-Sequence with Attention

In our latest release, version, we have added support for Sequence-to-Sequence[1] (Seq2Seq) models with Attention[2][3], and do so with the newly released CUDA 11.3/cuDNN 8.2 from NVIDIA.  Seq2Seq models solve many difficult problems such as language translation, chat-bots, search and time-series prediction.

Seq2Seq model with Attention

The Seq2Seq model is made up of an Encoder (left side) that is linked to the Decoder (right side) by the Attention layer which essentially learns to map the encoder input to the decoder output.  During the model processing, an embedding is learned for the encoder and decoder inputs.  An encoder embedding is produced for both the encoder input and its reverse representation.  These two embeddings are then fed into two LSTM layers that learn the encodings for each which are then concatenated together to produce the encoding inputs that eventually are fed to the Attention layer within the LSTMAttention layer.  An embedding is also learned for the decoder inputs which are then fed to the LSTMAttention layer as well.

Within the LSTMAttention layer, the encoder encodings and last state from the LSTM Attention layer are used to produce the context for the encoding inputs.  This context is then added to the LSTM cell state to produce the decoded LSTM outputs which are then run through an internal inner product and eventual softmax output.  The softmax output is then used to determine the most likely word index produced which is then converted back to the word using the index-to-word mapping of the overall vocabulary.  The resulting cell state is then fed back into the attention layer to produce the next context used when running the decoding LSTM on the next decoder input.

During training, the decoder input starts with the start of sequence token (e.g. ‘1’) and is followed by target0, then target1, and so on until all expected targets are processed in a teacher forcing manner.

Once training completes, the model is run by first feeding the input data through the model along with a decoder start of sequence (e.g. ‘1’) token and then the decoder half of the model is run by feeding the resulting output token back into the decoder input and continuing until an end-of-sequence token is produced.   The string of word tokens produced are each converted back to their corresponding words and output as the result.

This powerful model essentially allows for learning how to map the probability distribution of one dataset to that of another.

With this release, we have also released three sample applications that use the LSTM models in three different applications.

SinCurve – In the first sample, we use the LSTM model to learn how to produce a Sin curve by training the model with teacher forcing on the Sin curve data where the previous data predicts the next data in the curve.

ImagetoSign – in the next sample, we use the LSTM model to learn to match hand drawn character images from the MNIST dataset, shown in sequence, to then draw different portions of the Sin curve.  The second to last inner product data from the MNIST model is input into the LSTM model which then learns to produce a segment of the Sin curve based on the hand written character detected.

Seq2SeqChatBot – in this sample, a Seq2Seq encoder/decoder model with attention is used to learn the question/response patterns of a chat-bot that allow the user to have a conversation with the Chat-bot.  The model learns embeddings for the input data that are then encoded with two LSTM layers and then fed into the LSTMAttention layer along with the encoded decoder input to produce the output sequence.

New Features

The following new features have been added to this release.

  • Added support for CUDA 11.3 and cuDNN 8.2 with NVAPI 650.
  • Tested on Windows 20H2, OS Build 19042.985, SDK 10.0.19041.0
  • Added ability to TestMany after a specified time.
  • Added signal vs. signal average comparison to Model Impact Map.
  • Added color separation to Model Impact Map.
  • Added support for visualizing custom stages from solver.prototxt.
  • Added new MISH layer support.
  • Added new HDF5_Data layer support.
  • Added new MAE Loss layer support.
  • Added new COPY layer support.
  • Added new LSTMAttention layer support.
  • Added new Seq2Seq model support with Attention.
Bug Fixes
  • Fixed bug in RNN learning where weights are now loaded correctly.
  • Fixed bugs related to accuracy calculation in RNN’s.
  • Fixed bug causing UI to lockup when multiple tasks were run.
  • Fixed bug in Copy Dataset creator where start and end dates are now used.
  • Fixed crash occurring after out of memory errors on application exit.
  • Fixed crash caused when importing weights with missing import directory.
  • Fixed bug where InnerProduct transpose parameter was not parsed correctly.
  • Fixed bug in concat_dim, now set to uint?, and ignored when null.
  • Improved physical database query times.
Known Issues

The following are known issues in this release.

  • Exporting projects to ONNX and re-importing has known issues. To work around this issue, the weights of an ONNX model can be imported directly to a similar model.
  • Loading and saving LSTMAttention models using the MyCaffe weights has known issues.  Instead, the learnable_blobs of the model can be loaded and saved directly.

Happy deep learning with attention!

[1] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, Sequence to Sequence Learning with Neural Networks, 2014, arXiv:1409.3215.

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention Is All You Need, 2017, arXiv:1706:03762.

[3] Jay Alammar, The Illustrated Transformer, 2017-2020, Jay Alammar Blog.

Debugging complex AI Solutions

In our latest release, version, we have added a new powerful debugging technique that visually shows the  areas within an image that impact the firing of each label, and do so with the newly released CUDA 11.2/cuDNN 8.1 from NVIDIA.

Model Impact Visualization

The example above shows the areas within the CIFAR-10 [1] dataset images that actually have the most impact on firing each detected label.

Unlike the label impact visualization, originally inspired by [2], the model impact visualization shows the impact of each area of the image space on all labels, whereas the label impact shows the impact on a single label.

Label Visualization of ‘Automobile’ label sample

To learn more about these debugging techniques, see the new ‘Debugging AI Solutions‘ tutorial.

New Features

The following new features have been added to this release.

  • Added support for CUDA 11.2 / cuDNN 8.1.
  • Added new Model Impact Visualization.
  • Added new Model Reverse Visualization.
  • Added new Transpose Layer.
  • Added new Gather Layer.
  • Added new Constant Layer.
  • Added ONNX InceptionV2 to public models.
  • Added support for compute/sm 3.5 through 8.0.
  • Added ability to reset weights in any layer.
  • Added ability to freeze/unfreeze learning in any layer.
  • Optimized project status updates.
  • Optimized image loading.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug related to scheduling solvers with file data.
  • Fixed bug related to scheduling Char-RNN projects.
  • Fixed bug related to reinforcement learning rewards display.
  • Fixed bug in model editor where nodes exceeded 32k pixel limit.
  • Fixed bug related to renaming data input causing failures.

For other great examples, including, Neural Style Transfer, beating ATARI Pong and creating new Shakespeare sonnets, check out our Examples page.

[1] Alex Krizhevsky, The CIFAR-10 dataset.

[2] Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks, 2013, arXiv:1311.2901.

Battle of the Bits – FPGA vs. GPU

Over the past ten years, Graphic Processing Units (GPUs) have dominated the hardware solution space for artificial intelligence.  Currently, these GPU’s are a steady staple in just about every Datacenter offering.  NVIDIA and AMD, two of the largest GPU manufacturers, have ridden this wave well and increased their profits and stock prices dramatically.

However, there appears to be a new challenger on the horizon – the Field Programmable Gate Array (FPGA), which according to some, may be an even better hardware platform for AI than the GPU.

“Artificial intelligence (AI) is evolving rapidly, with new neural network models, techniques, and use cases emerging regularly. While there is no single architecture that works best for all machine and deep learning applications, FPGAs can offer distinct advantages over GPUs and other types of hardware in certain use cases” Intel [1].

“Achronix’s high-performance FPGAs, combined with GDDR6 memory, are the industry’s highest-bandwidth memory solution for accelerating machine learning workloads in data center and automotive applications. This new joint solution addresses many of the inherent challenges in deep neural networks, including storing large data sets, weight parameters and activations in memory. The underlying hardware needs to store, process and rapidly move data between the processor and memory. In addition, it needs to be programmable to allow more efficient implementations for constantly changing machine learning algorithms. Achronix’s next-generation FPGAs have been optimized to process machine learning workloads and currently are the only FPGAs that offer support for GDDR6 memory” Micron Technology [2].

“Today silicon devices (ex: FPGA / SOC / ASIC) with latest process technology node, able to accommodate massive computing elements, memories, math function with increased memory and interface bandwidth, with smaller-footprint and low power. So having a different AI accelerator topology will certainly have advantage like Responsive, Better security, Flexibility, Performance/Watts and Adoptable. This helps in deploying different network models addressing various application and use case scenarios, by having scalable artificial intelligence accelerator for network inference which eventually enable fast prototyping and customization during the deployment” HCL Technologies [3].

In addition, large acquisitions taking place recently speak to this emerging trend from GPU to FPGA.

As shown above, when the two largest GPU manufacturers (NVIDIA and AMD) make large acquisitions of large, established FPGA companies, change is in the wind.

Why Use an FPGA vs. GPU?

FPGA’s “work similarly to GPUs and their threads in CUDA” Ashwin Sing [4].  According to Sing, several benefits of using an FPGA over a GPU include: Lower power consumption, accepted in safety-critical operations, and support for custom data types all of which are ideal for embedded applications used in edge devices such as automatic driving cars.

In addition, the FPGA appears to have one very large and growing advantage over GPU solutions – Memory.  As of this writing NVIDIA recently released its 3090 GPU with a whopping 24GB of memory which is a great step for AI model designers given its low sub $1500 price point.  However, the amount of memory available to the GPU (or FPGA) directly translates into faster training times for large models which then push the demand for more memory on the edge devices doing the inferencing.  With larger amounts of memory, the training process can use larger input image sizes along with larger batches of images during training thus increasing the overall trained image/second throughput.  Larger images lead to higher image resolution which then leads to higher training accuracies.

Memory chip specialists like Micron Technologies argue “that hardware accelerators linked to higher memory densities are the best way to accelerate ‘memory bound’ AI applications” [5].  By combining a Xilinx Virtex Ultrascale+ FPGA with up to a massive 512GB of DDR4 memory, Micron is clearly demonstrating a large advantage FPGA’s appear to have over GPU’s.

FPGA AI Challenges

Currently, programming the FPGA for machine learning is complex and difficult for “the requirement for laying out and creating hardware is a large barrier to the use of FPGAs in deep learning” [4].  However specialized compilers such as the one provided by Halide, may be changing this.  In 2016, Li et al. proposed “an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput” [6].  An idea further extended by Yang et al. in 2018, “Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators” [7].  According to Yang et al., their FPGA and ASIC back ends to Halide were able to “achieve similar GOPs and DSP utilization when compared against manually optimized designs.”

Compilers do help reduce the complexity, but do not eliminate it which is especially important when using an FPGA device for training an AI solution.  During the training process an AI developer is often faced with the problem of diagnosing and debugging their AI model so that it trains properly on their given dataset.  Solving the ‘model blow-up’ problem can be difficult without being able to actually visually analyze the data flowing from one layer to the next.  In addition, network bottlenecks can be hard to locate when using a large 100+ layer network.

The visual editing and debugging capabilities of the SignalPop AI Designer coupled with the plug-n-play low level software of the MyCaffe AI Platform [8] can help dramatically reduce or eliminate these complexities.

MyCaffe AI Platform Plug-n-Play Architecture

The MyCaffe AI Platform uses a plug-n-play architecture that allows for easy separation of the AI platform (e.g. Solver, Network and Layers) from the low-level primitives that are specific to a given hardware platform (e.g. GPU or FPGA device).

MyCaffe Plug-n-Play Hardware Support

We currently use this architecture to support the various and rapidly changing versions of CUDA and cuDNN produced by NVIDIA.  However, this architecture lends itself well to the future movement from the GPU to FPGA devices discussed above.  And changing between hardware support is quick and easy.  From within the SignalPop AI Designer, the user just selects the hardware DLL from a menu item.  And when programming MyCaffe, the path to the desired hardware DLL is passed to the MyCaffeControl during initialization.

MyCaffe and FPGA

With technologies like Halide to generate much of the low-level software that runs on the FPGA, the MyCaffe plug-n-play architecture is well suited to support a new low-level DLL designed specifically for FPGA devices.

Not only does adding FPGA support expand the reach of the MyCaffe AI Platform to the potential future of AI, it does so for over 6.2 million C# developers world-wide [9].

The SignalPop AI Designer

Combining the plug-n-play design of MyCaffe with the visual editing and debugging features of The SignalPop AI Designer can dramatically reduce the complexity of building AI solutions in general and specifically can make it easier develop such solutions for FPGA devices.

Visual Editing

Typically AI models are developed either using a text-based script that describes the model, or by programmically constructing the model by linking one layer to another.  The SignalPop AI Designer transforms the prototxt model descriptions (used by the original CAFFE [10] open-source platform) into a visual representation of the model that allows for easy parameter changing, one-click help, and live debugging.

Visual Model Editing

Developers can easily switch between the visual editor and text script editor and when the model is saved, the final model descriptor prototxt is produced.

Easy Transfer Learning

After constructing the AI model, developers can easily import weights from other pre-trained models for quick transfer learning.

Easy Transfer Learning

Weights can be imported from models in the ONNX or native CAFFE file formats.

Visual Model Debugging

The SignalPop AI Designer allows visualization of the trained weights, locating bottlenecks, inspecting individual layers and visualizing the data that flows between layers are all key aspects to debugging complex AI models.

Visually Inspecting Weights

By allowing easy weight visualizations, the developer can quickly see if the expected weights are loaded during training.

Identifying Layer Bottlenecks

During training, developers can optionally view the data as it flows through the network in the Debug window and observe the timing of each layers forward and backward pass thus easily showing the designer where the network’s bottlenecks are.

Visually Inspect Embeddings

Right clicking on a debug layer while training allows for easy embedding visualization using the TSNE algorithm.

Visualizing Data Flow

Right clicking on a model layer link while training allows the designer to see the actual data flowing between the layers on both the forward and backward passes.


The MyCaffe AI Platform gives the AI developer flexibility to target different hardware platforms while the SignalPop AI Designer provides an easy to use visual development environment.

Combining these two offers an AI platform uniquely suited for creating customized solutions and is perfectly positioned for future generations of FPGA AI challenges that may soon supersede the GPU solutions of today.

If you are an FPGA manufacturer searching for an AI software solution that makes AI programming easier, contact us for we would like to work with you!


[1] Intel, FPGA vs. GPU for Deep Learning, 2020.

[2] Micron Technology, Micron and Achronix Deliver Next-Generation FPGAs Powered by High Performance GDDR6 Memory for Machine Learning Applications, 2018.

[3] HCL Technologies, Edge Computing using AI Accelerators, 2020.

[4] Ashwin Singh, Hardware for Deep Learning: Known Your Options, Towards Data Science, 2020.

[5] George Leopold, Micron Deep Learning Accelerator Gets Memory Boost, Interprise AI, 2020.

[6] Huimin Li, Xitan Fan, Wei Cao, Xeugong Wei and Lingli Wang A high performance FPGA-based accelerator for large-scale convolutional neural networks, IEEE, 2016.

[7]Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Ou Setter, Jing Pu, Ankita Nayak, Steven E. Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis and Mark Horowitz, Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators, arXiv 1809.04070, 2018, 2020.

[8] Dave Brown MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement Learning, arXiv:1810.02272, 2018.

[9] DAXX How Many Software Developers Are in the US and the World, 2020.

[10] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv:1408.5093v1, 2014.

Sometimes finding the pattern IS the challenge!

In our latest release, version, we use Single-Shot Multi-Box Detection (SSD) as described in [1] to find complex patterns in high frequency financial data streams and do so with CUDA 11.1.1 and cuDNN 8.0.5 recently released by NVIDIA.

Starting with less than 100 data items, we were able to build a training data set of around 600 items which were then used to train the model to detect both up and down trends in the SPY data.

Using proprietary plug-ins to the SignalPop AI Designer, the initial training data set was created by drawing ‘boxes’ on areas within the price stream that coincided with up or down trends.  Those boxes were then translated into locations within the actual data items then fed into the model for training.

After learning a small sub-set of items, the model was then able to identify new candidates that were visually inspected for accuracy and added to the training set if the patterns detected met a general criteria.  Once the data set was large enough, the model was trained for a longer training period thus allowing the model to improve its overall accuracy.

How Does This Work?
Detecting Complex Patterns with Single-Shot Multi-Box

Internally, the Single-Shot Mutli-Box learns to identify the patterns within the ground-truth boxed areas within each data item in the training set.  Each data item in turn contains over 36,000 data points pulled from surrounding tangential markets that lead to a greater understanding and depth of market momentum and directional change.  All data items are synced across time.

During training, the algorithm sorts through millions of potential boxes areas within these data items until the model learns the best fit boxed areas that match the patterns sought. Simultaneously, the model also learns the confidence level of the learned pattern matching the desired pattern.  At the top of the model, a stack of layers from the VGG16 model are used to help detect the actual patterns desired.

SSD Model

The glue that brings this model together and makes it work is the MultiBox Loss layer which learns to find the best matching patterns that fall within the ground-truth boxes originally annotated in the training set (e.g. drawn in the financial data price stream).

All together, the 105 layers making up the Single-Shot Mutli-Box algorithm creating quite complex model as shown below.

Full SSD Model

This model is already proving to be very helpful in just locating difficult to find, key patterns within very large data-sets.

New Features

The following new features have been added to this release.

  • CUDA 11.1.1/cuDNN 8.0.5 support added.
  • Upgraded all builds to .NET Framework 4.7.2
  • Upgraded C++ builds to SDK 10.0.19041.0
  • Optimized project loads by adding VerboseStatus=false as default.
  • Added DebugData and DebugCriteria support to SSD results.
  • Added resizing and cropping options to IMPORT.IMG dataset creator.
  • Added new button to easily activate only annotated images.
  • Added statistics and create support file operations to IMPORT.IMG dataset creator.
  • Improved folder selection on all dataset creators.
  • Added default ‘background’ label to IMPORT.IMG and IMPORT.VID dataset creators.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug where project would not update to the dataset dropped on it when using AnnotatedData layer.
  • Fixed bugs in IMPORT.IMG dataset creator.
  • Fixed bugs related to Dataset Creator not remembering current selection.
  • Fixed bugs related to running international versions of Windows 10.
  • Fixed bug related to adding annotated labels with duplicates.
  • Fixed bug in resources, removing duplicate GPU name.
  • Fixed bug caused when opening a project with no dataset.
  • Fixed bug occurring when using Label Balancing which could cause a crash.
  • Fixed bug caused when importing weights and locking up the dialog.
  • Fixed bug caused when training multiple projects at the same time with the same image database.
  • Fixed bugs related to setting labels on a dataset when annotating.
  • Fixed bugs related to opening a project with the CudaDnnDll.dll missing.
  • Fixed bug in TestMany when a label is detected outside the scope of labels.

For other great examples, including, Neural Style Transfer, beating ATARI Pong and creating new Shakespeare sonnets, check out our Examples page.

[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, arXiv:1512.02325, 2016.

Object Detection with Single-Shot Multi-Box now supported using CUDA 11.1 and cuDNN 8.0.4!

In our latest release, version, we have added support for object detection using Single-Shot Multi-Box Detection (SSD) as described in [1] and do so with the newly released CUDA 11.1 and cuDNN 8.0.4!

Single-Shot Multi-Box (SSD) object detection

With SSD, one can easily and quickly detect multiple objects within images as shown above, and/or video frames as shown below.

How Does This Work?

The MyCaffe AI Platform now implements the data annotations used to locate and label each object and the new layers needed to detect them, such as the new Permute, PriorBox, MultiBoxLoss, DetectionOutput and DetectionEvaluation layers to create the fairly complex SSD model – 105 layers in all.

SSD Model Simplified

Essentially, the SSD model is a merge between the VGG16 model (used to extract image features) and the single-shot multi-box model (used to detect objects and their locations).  The VGG16 layers feed into a set of layers used to detect the box locations and also into a separate set of layers used to detect the confidence levels for the object within each box.  Together, the box location layers and confidence layers are fed into the MultiBoxLoss layer which then merges the confidence loss calculation with the box confidence loss calculation.  When calculating the box location loss, either a Euclidean loss or smooth L1 loss is used, with the latter being the default.  And, when calculating the box confidence loss, either a sigmoid cross entropy loss or softmax loss is used, with the latter being the default.

All together, this creates a pretty complex model as shown below.

Full SSD Model

After digesting this model for a bit, you will see that the following simplification shows how the various layers flow into the MultiBoxLoss layer.

SSD Up Close

Using the new SignalPop AI Designer’s annotation editor, you can now easily create new datasets to train and run the SSD model on!

To try out the SSD model for yourself, see the new SSD video tutorial which walks you through creating your own annotated dataset from an MVW video and training a new SSD model on it.  Also, see the new SSD image tutorial which walks you through creating your own annotated dataset from a directory of images and training a new SSD model on them.

New Features

The following new features have been added to this release.

  • CUDA 11.1.0/cuDNN 8.0.4 support added.
  • Upgraded all builds to Visual Studio 2019.
  • Added SSD TestAll support showing predicted boxes and classes.
  • Added SSD data annotation editor for building datasets.
  • Added SSD results annotation selector for expanding datasets.
  • Added new IMPORT.VID dataset creator used to import videos.
  • Added ability to set default CudaDnnDll location.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed database transient errors with new database connection strategy used with the SignalPop Universal Miner distributed AI support.
  • Fixed bugs related to running on International versions of Windows 10.
  • Fixed bug related to double clicking on target datasets.

For other great examples, including, Neural Style Transfer, beating ATARI Pong and creating new Shakespeare sonnets, check out our Examples page.

[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, arXiv:1512.02325, 2016.

Distributed AI now supported using CUDA 11.0.3 and cuDNN 8.0.3!

In our latest release, version, we now support distributed AI via the SignalPop Universal Miner, and do so with the recent CUDA 11.0.3 and cuDNN 8.0.3 release!

The SignalPop AI Designer now allows scheduling AI projects which are then loaded and trained by a separate instance of the SignalPop Universal Miner running on a remote machine.

Distributed AI

With this configuration you can now easily develop your AI projects on your development machine and then train those same projects on your remote testing machine thus freeing up your development machine for more development work.

How does this work?

When scheduling a project, the project is placed into the scheduling database where it is later picked up by the SignalPop Universal Miner for training.  During training, the SignalPop Universal Miner uses the same underlying SignalPop AI Server software to train the model using the MyCaffe AI Platform and MyCaffe In-Memory Database.  Upon completion, the trained weights are placed back in the scheduling database allowing the user on the development machine to copy the results back into their project.

Distributed AI Process

The following steps occur when running a distributed AI solution.

  1. First the designer uses the SignalPop AI Designer on the Development Machine to create the dataset and work-package data (model and solver descriptors) which are stored on the local of Microsoft SQLEXPRESS, running on the same machine as the SignalPop AI Designer application.
  2. Next, the designer uses the SignalPop AI Designer to schedule the project by adding a new work-package to the scheduling database.  The work-package contains encrypted data describing the location of the dataset and work-package data to be used by the remote Testing Machine during training.
  3. On the Testing Machine, the SignalPop Universal Miner is assigned the scheduled work-package.
  4. Upon being assigned to the project, the SignalPop Uiversal Miner on the Testing Machine uses the SignalPop AI Server to load the work-package data and uses it to open and start training the project.
  5. During loading of the project, the SignalPop AI Server creates an instance of MyCaffe and loads the project into it.
  6. In addition, the SignalPop AI Server creates an instance of the MyCaffe In-Memory Database and sets its connection credentials to those specified within the scheduled work-package thus allowing the in-memory database to access the training data residing on the designers Development Machine.
  7. After the training of the model completes, the SignalPop Universal Miner running on the Testing Machine saves the weights and state back to the developers Development Machine and then marks the work-package as completed in the scheduling database.
  8. Back on the designer’s Development Machine, when the SignalPop AI Designer detects that the project is done, the project is displayed as completed with results.  At this point the designer may copy the scheduled results from the work-package data into the projects local results residing on the local SQLEXPRESS database used by the SignalPop AI Designer.

Since both the SignalPop AI Designer and SignalPop Universal Miner both use the same SignalPop AI Server for training AI projects, the results are the as if the project were trained locally on the designer’s Development Machine.

To get started using distributed AI, see the ‘Scheduling Projects‘ section of the SignalPop AI Designer Getting Started document.

New Features

The following new features have been added to this release.

  • CUDA 11.0.3/cuDNN 8.0.3 support added.
  • Added ability to schedule projects for distributed AI remote training.
  • Added load limit refresh rate.
  • Added load limit refresh percentage.
  • Added easy switching between convolution default, convolution optimized for speed and convolution optimized for memory.
  • Optimized convolution forward pass.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs related to visualizing net and model with LoadLimit > 0.
  • Fixed bugs related to last TSNE image disappearing.
  • Fixed bugs related to exporting a project while the project is open.
  • Fixed bug caused when exiting while training Pong.

For other great examples, including beating ATARI Pong, check out our Examples page.

TripletNet now supported using CUDA 11.0.2 and cuDNN 8.0.2!

In our latest release, version, we have added support for the TripletNet used for one-shot and k-n shot learning as described in [1][2], and do so with the newly released CUDA 11.0.2 and cuDNN 8.0.2!

The TripletNet employs three parallel networks that each learn: an anchor image, a positive image that matches the label of the anchor, and a negative image that does not match the label of the anchor.  A new Data Sequence Layer feeds the correct sequence of each of these three images into the three parallel networks (anchor, positive and negative).

At the bottom of the network the Triplet Loss Layer calculates the loss to move the positive images toward the anchor image and the negative images away from it.  For details on the loss gradient calculation, see [3].  During the learning process, similar images tend to group together into clusters.  To see this learned separation in action, first add a Debug Layer to the ip2 layer which learns the 4 item embedding of the anchor images.

Adding a Debug Layer

The Debug Layer caches up to 1000 of the most recently learned embeddings that are passed to it during each forward pass through the network.

Next, train the TripletNet for around 2,500 iterations where it should reach around 97% accuracy.  At this point the Debug Layer will have a full cache of 1000 embeddings.

Once trained, right click on the Debug Layer and select ‘Inspect Layer‘ to run the TSNE algorithm on a subset of the stored embeddings.  As shown below, the TSNE algorithm demonstrates a clear separation between each of the learned embeddings for each anchor image label.

TripletNets are very effective at learning a larger dataset, even when you only have a limited number of labeled data items.  For example, the 60,000 training images of MNIST can be learned up to 80% accuracy with only 30 images of each of the 10 classes of hand written characters 0-9.

To demonstrate this, we first create a small sub-set of the MNIST dataset consisting of 30 images per label for both testing and training – a total of 600 images (1% of the MNIST 60,000 training images).  And of the 600 images, only 300 images are used for training (0.5% of the original set) where the remaining 300 are used for testing.

After training the model up to around 80% accuracy, we saved the weights and then replaced the 600 image dataset with the original, full 60,000/10,000 image MNIST dataset.

Next, we ran the ‘Test Many’ function on the original MNIST dataset, using the weights learned from the 600 image MNIST sub-set dataset  and attained an accuracy of 80%, showing that the majority of the full MNIST dataset can be learned with a much smaller training dataset using the TripletNet model!

To try out the TripletNet for yourself, see the TripletNet tutorial which walks through the steps to train MNIST using only 1% of the original MNIST dataset.

New Features

The following new features have been added to this release.

  • CUDA 11.0.2/cuDNN 8.0.2 support added.
  • Added ONNX InceptionV1 model support to the Public Models dialog.
  • Added ability to remove orphaned project files from the database.
  • Added ability to change labels for each item within a dataset.
  • Added new Data Sequence Layer support.
  • Added new Triplet Loss Layer support.
  • Added new Image Import dataset creator.
  • Added new Auto Label to the COPY dataset creator.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs in Public Models dialog allowing hyperlink click during download.
  • Fixed bugs caused when creating datasets.
  • Fixed bugs in project import.

For other great examples, including beating ATARI Pong, check out our Examples page.


[1] E. Hoffer and N. Ailon, Deep metric learning using Triplet network, arXiv:1412:6622, 2018.

[2] A. Hermans, L. Beyer and B. Liebe, In Defense of the Triplet Loss for Person Re-Identification, arXiv:1703.07737v2, 2017.

[3] Shai, What’s the triplet loss back propagation gradient formula?, StackOverflow, 2015.

And Now for Something Completely Different

In the finance world, ‘open interest‘ as defined by Investopedia is, “the total number of outstanding derivative contracts, such as options or futures.”  And for those of you not familiar with the term, an option is a contract that gives the owner the right to either buy (call option) or sell (put option) a stock at a given price (strike price) on a specific date (the expiration date).

In the AI field, there is not only a thirst for learning better and better AI models but also a need to understand the data on which the AI models run.  Good AI models don’t just give us the answer that we seek, but help us understand the data as a whole thus empowering us (as humans) to do even better at the task at hand.  As an AI researcher you are continually looking for data anomalies that may give you a slightly better signal when trained under the right AI model.

While working with option open interest data we ran across one such anomaly in the near-term open interest data of the GLD (gold), SLV (silver) and UUP (US Dollar) ETF’s.

Before discussing this odd relationship, let’s discuss what led us to it.  To get a better view on the markets price direction ‘bias’ for a given instrument, we create a graph showing the difference between call and put option open interest per expiration which are then tabulated and displayed per strike along the x-axis in a histogram format.  Positive histogram bars represent strikes with more call option open interest than put option open interest, and negative histogram bars represent strikes with the opposite relationship.  Given that the differences are taken per expiration data (for which there are many per strike), each strike may show both positive and negative bars where a balanced (or no) bias would show the same length positive and negative bar on the same strike.

In a simple sense, this graph provides a visual representation of the put/call ratio for the instrument. This graphical analysis can quickly tell you whether the market is more biased toward call options (e.g. more call open interest than put) or put options (e.g. more put open interest than call).  And in our limited view (and this is in no means intended as investment advice), we would interpret a high bias toward calls to mean the market expects the price to rise and a high bias towards puts to mean the market expects the price to fall.

While visualizing the open interest on the UUP (US Dollar) ETF, we found a strong skew toward call options with very little put option open interest.  To us this seemed somewhat intuitive, for there is currently a very large global dollar shortage [1][2][3] which seems to be persisting even after the massive stimulus moves by the FED [4][5].

UUP 6/22/2020 Differential Open Interest (updated)

As shown above, there is a strong bias towards the call options for UUP with most of the call option open interest showing up in the 7/17/2020 expiration.  We interpret this to mean the market is anticipating the value of UUP (and the US Dollar) to rise in value.

According to macroaxis, both GLD and SLV each have a negative correlation to UUP.

GLD, SLV and UUP Correlation

GLD has a –0.54 negative correlation to UUP and

SLV has a -0.78 negative correlation to UUP.

Intuitively, with these negative correlations, we would expect to also see a bias towards the puts in both GLD and SLV.  However, surprisingly, that is not so.

Both the GLD ETF…

GLD 6/22/2020 Differential Open Interest (updated)

…and the SLV ETF…

SLV 6/22/2020 Differential Open Interest (updated)

have VERY strong biases toward the call options with far more call open interest than put open interest observed on both GLD and SLV!

This is not only counter intuitive, but goes against the negative correlations observed.  These are indeed strange and extreme times.  Is the market wrong?

Or, given the extreme environment, are countries who do not have access to the US currency swap lines using the the smaller metals markets as a hedge which ends up driving up both GLD/SLV and UUP at the same time?

Well, we really don’t know, but if you would like to use AI modeling and analytics to get a better understanding of what actually is happing, let us know for we would like to work with you!

For serious inquiries just send us a note on our Contact Us page.

Full Disclosure, from time to time we may hold open positions in GLD, SLV and/or hold USD.  As always, use at your own risk and do your own diligence.

[1] B. W. Setser (Mar 17, 2020). Addressing the Global Dollar Shortage: More Swap Lines? A New Fed Repo Facility for Central Banks? More IMF Lending?. Council on Foreign Relations
[2] C. Anstey and E. Curran (Mar 22, 2020). Dire Dollar Shortage Shows Failure to Fix Key Crisis Flaw. Bloomberg
[3] M. Every (Apr 12, 2020). ‘Down The Rabbit Hole’ – The Eurodollar Market Is The Matrix Behind It All. ZeroHedge
[4] D. Lacalle (Mar 31, 2020). Why the World Has a Dollar Shortage, Despite Massive Fed Action. Mises Institute
[5] D. Lacalle (May 3, 2020). Global US Dollar Shortage Rises as Emerging Markets Lose Reserves. DanielLacalle Site

ONNX AI Model Format now supported by the SignalPop AI Designer!

In our latest release, version, we have added support for the ONNX AI model format.  The Open Neural Network Exchange (ONNX) AI model format is a generic AI model format supported by many AI vendors that allows sharing AI models between different AI platforms and tools.  Using the SignalPop AI Designer and  MyCaffe AI Platform you can now easily export MyCaffe models to *.onnx files and import from *.onnx files into MyCaffe models.  Alternatively, you can import just the weights within a *.onnx file into your existing SignalPop AI Designer project. 

Importing ONNX Files Into MyCaffe

Selecting the ‘File | Import‘ menu and selecting the ‘Get Public Models‘ button, displays the newly designed ‘Public Models‘ dialog.

Public Models Dialog

Simply, select the ONNX based model, download and import it into your new project.  When importing, weight blobs that have matching sizes to your model are imported directly.

In some cases, your blob sizing’s may not match, or you may only want to import a few weight blobs, which is typically done when performing transfer learning.

To import a subset of weight blobs or verify the size matches, open the new project, right click on the ‘Accuracy‘ icon and select the ‘Import‘ menu.  Select the *.onnx file who’s weights are to be imported and then press the ‘Load Details‘ button to see the weight blob sizing.

Import Weights Dialog

Check the blobs to import , check to ‘Save‘ checkbox to save them in your model and press ‘OK‘ to import the new weights.

Once imported, double click on the ‘Accuracy‘ icon while the project is open so that you can visualize the weights.  Selecting the ‘Run weight visualization‘ () button visually displays all weight blobs allowing you to verify that they are correct.

For example, the following are the first set of weights from the ‘ResNet50’ model imported from ONNX.

ResNet50 Weights
Exporting MyCaffe Projects to ONNX Files

To export a MyCaffe project to an *.onnx file, right click on the project name and select the ‘Export‘ menu item which displays the ‘Export Project‘ dialog.

Export Project Dialog

Select the ‘ONNX‘ format radio button and press the ‘Export‘ button to export your project into a *.onnx file.

Model Notes

The following should be noted on each of the ONNX models currently supported.

AlexNet – All weights except fc6_1 weights (relies on external sizing) import, however fc8 weights and fc8 bias only import when using the same 1000 outputs as the ONNX model.

GoogleNet – All weights import, however loss3/classifier_1 weights and loss3/classifier_1 bias only import when using the same 1000 outputs as the ONNX model.

VGG16 and VGG19 – All weights except vgg0_dense0_fwd_weights (relies on external sizing) import, however vgg0_dense2_fwd_weighs and vgg0_dense2_fwd_bias only import when using the same 1000 outputs as the ONNX model.

ResNet50 – Only external weights are imported and for this reason, weights should be re-imported with ‘Include internal blobs‘ unchecked.  For example, the ONNX model does not have the global_mean, global_variance and var_correction blobs used by the BatchNorm layer.  When unchecking ‘Include internal blobs‘ all weights are imported, however the resnetv17_dense0_fwd_weights and resnetv17_dense0_fwd_bias are only imported when using the same 1000 outputs as the ONNX model.

InceptionV1All weights import, however loss3/classifier_1 weights and loss3/classifier_1 bias only import when using the same 1000 outputs as the ONNX model.


Under the hood, the SignalPop AI Designer uses the MyCaffe AI Platform’s new MyCaffeConversionControl to both import from and to *.onnx files.

Importing an *.onnx file is performed with just a few lines of C# code.

Importing an ONNX file

And, exporting is just as easy.

Exporting to ONNX file

To see the code and try it out yourself, see the OnnxExamples project on GitHub.

For other examples that show how to use the MyCaffeConversionControl, see the TestPersistOnnx automatic test functions.

New Features

The following new features have been added to this release.

  • Added ONNX AI Model support for importing *.onnx files to MyCaffe.
  • Added ONNX AI Model support for exporting MyCaffe models to *.onnx files.
  • Added model layer counts to the Model Editor.
  • Improved Weight Import dialog usability.
  • Improved Public Model dialog.
  • Added support for very large models such as ResNet152 and Siamese ResNet152.
  • Added MultiBox support to TestAll.
  • Added ability to run Label Impact on any image file.
  • Upgraded to EntityFramework 6.4.4
  • Upgraded to Google.ProtoBuf 3.12.1
  • Added DISABLED snapshot update method type to disable snapshots on a project.
Bug Fixes
  • Fixed bug that limited very large model sizes.
  • Fixed bug related to saving best training solver state and weights.
  • Fixed bugs related to the ResNet56 model.

To try out training various model types just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.

Large Art Prints Created with The SignalPop AI Designer

The SignalPop AI Designer and the MyCaffe AI Platform‘s implementation of Neural Style Transfer[1] and the VGG model[2] are now used to create large sized artistic prints – some as large as a ping-pong table! Creating these prints can run up to 2 trillion calculations and requires nearly all of the 50GB of video memory offered by the NVIDIA Quadro RTX 8000 GPU.

Trevor Kennison’s epic launch off Corbet’s Couloir, Jackson Hole, WY.

The print of Trevor Kennison was a collaborative effort between former US Ski Team photographer Jonathan Selkowitz, artist Noemí Ibarz of Barcelona, Spain and Signalpop who provided the artificial intelligence software, including both the SignalPop AI Designer and MyCaffe AI Platform.

Photo to Art Collaboration

Artificial intelligence helps bring together the creative talents of both the photographer and artist to create a new, collaborative piece of work!

A closer view of the print shows how the AI actually learns Noemí’s artistic style and paints Jonathan’s picture of Trevor with it to create a fantastic, new piece of art!

Trevor Print Detail #1
Trevor Print Detail #2
Trevor Print Detail #3

The neural style transfer, learns not only the colors of Noemí’s art, but also the brush strokes and even the texture of the medium on which her art was painted.

Art by Noemí Ibarz

Visit Instagram@noemi_ibarz to see more of Noemí’s fantastic colorful art!

Visit Selko Photo to see more of Jonathan’s beautiful photography that does an amazing job of capturing motion in a still image.

In an effort to help the High Fives Foundation (who helped Trevor get his jump back) we are auctioning off this print.

[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, A Neural Algorithm of Artistic Style, 2015, arXiv:1508:06576.

[2] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014, arXiv:1409.1556.