Object Detection with Single-Shot Multi-Box now supported using CUDA 11.1 and cuDNN 8.0.4!

In our latest release, version 0.11.1.56, we have added support for object detection using Single-Shot Multi-Box Detection (SSD) as described in [1] and do so with the newly released CUDA 11.1 and cuDNN 8.0.4!

Single-Shot Multi-Box (SSD) object detection

With SSD, one can easily and quickly detect multiple objects within images as shown above, and/or video frames as shown below.

How Does This Work?

The MyCaffe AI Platform now implements the data annotations used to locate and label each object and the new layers needed to detect them, such as the new Permute, PriorBox, MultiBoxLoss, DetectionOutput and DetectionEvaluation layers to create the fairly complex SSD model – 105 layers in all.

SSD Model Simplified

Essentially, the SSD model is a merge between the VGG16 model (used to extract image features) and the single-shot multi-box model (used to detect objects and their locations).  The VGG16 layers feed into a set of layers used to detect the box locations and also into a separate set of layers used to detect the confidence levels for the object within each box.  Together, the box location layers and confidence layers are fed into the MultiBoxLoss layer which then merges the confidence loss calculation with the box confidence loss calculation.  When calculating the box location loss, either a Euclidean loss or smooth L1 loss is used, with the latter being the default.  And, when calculating the box confidence loss, either a sigmoid cross entropy loss or softmax loss is used, with the latter being the default.

All together, this creates a pretty complex model as shown below.

Full SSD Model

After digesting this model for a bit, you will see that the following simplification shows how the various layers flow into the MultiBoxLoss layer.

SSD Up Close

Using the new SignalPop AI Designer’s annotation editor, you can now easily create new datasets to train and run the SSD model on!

To try out the SSD model for yourself, see the new SSD tutorial which walks you through creating your own annotated dataset from an MVW video and training a new SSD model on it.

New Features

The following new features have been added to this release.

  • CUDA 11.1.0/cuDNN 8.0.4 support added.
  • Upgraded all builds to Visual Studio 2019.
  • Added SSD TestAll support showing predicted boxes and classes.
  • Added SSD data annotation editor for building datasets.
  • Added SSD results annotation selector for expanding datasets.
  • Added new IMPORT.VID dataset creator used to import videos.
  • Added ability to set default CudaDnnDll location.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed database transient errors with new database connection strategy used with the SignalPop Universal Miner distributed AI support.
  • Fixed bugs related to running on International versions of Windows 10.
  • Fixed bug related to double clicking on target datasets.

For other great examples, including, Neural Style Transfer, beating ATARI Pong and creating new Shakespeare sonnets, check out our Examples page.


[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, arXiv:1512.02325, 2016.

Distributed AI now supported using CUDA 11.0.3 and cuDNN 8.0.3!

In our latest release, version 0.11.0.188, we now support distributed AI via the SignalPop Universal Miner, and do so with the recent CUDA 11.0.3 and cuDNN 8.0.3 release!

The SignalPop AI Designer now allows scheduling AI projects which are then loaded and trained by a separate instance of the SignalPop Universal Miner running on a remote machine.

Distributed AI

With this configuration you can now easily develop your AI projects on your development machine and then train those same projects on your remote testing machine thus freeing up your development machine for more development work.

How does this work?

When scheduling a project, the project is placed into the scheduling database where it is later picked up by the SignalPop Universal Miner for training.  During training, the SignalPop Universal Miner uses the same underlying SignalPop AI Server software to train the model using the MyCaffe AI Platform and MyCaffe In-Memory Database.  Upon completion, the trained weights are placed back in the scheduling database allowing the user on the development machine to copy the results back into their project.

Distributed AI Process

The following steps occur when running a distributed AI solution.

  1. First the designer uses the SignalPop AI Designer on the Development Machine to create the dataset and work-package data (model and solver descriptors) which are stored on the local of Microsoft SQLEXPRESS, running on the same machine as the SignalPop AI Designer application.
  2. Next, the designer uses the SignalPop AI Designer to schedule the project by adding a new work-package to the scheduling database.  The work-package contains encrypted data describing the location of the dataset and work-package data to be used by the remote Testing Machine during training.
  3. On the Testing Machine, the SignalPop Universal Miner is assigned the scheduled work-package.
  4. Upon being assigned to the project, the SignalPop Uiversal Miner on the Testing Machine uses the SignalPop AI Server to load the work-package data and uses it to open and start training the project.
  5. During loading of the project, the SignalPop AI Server creates an instance of MyCaffe and loads the project into it.
  6. In addition, the SignalPop AI Server creates an instance of the MyCaffe In-Memory Database and sets its connection credentials to those specified within the scheduled work-package thus allowing the in-memory database to access the training data residing on the designers Development Machine.
  7. After the training of the model completes, the SignalPop Universal Miner running on the Testing Machine saves the weights and state back to the developers Development Machine and then marks the work-package as completed in the scheduling database.
  8. Back on the designer’s Development Machine, when the SignalPop AI Designer detects that the project is done, the project is displayed as completed with results.  At this point the designer may copy the scheduled results from the work-package data into the projects local results residing on the local SQLEXPRESS database used by the SignalPop AI Designer.

Since both the SignalPop AI Designer and SignalPop Universal Miner both use the same SignalPop AI Server for training AI projects, the results are the as if the project were trained locally on the designer’s Development Machine.

To get started using distributed AI, see the ‘Scheduling Projects‘ section of the SignalPop AI Designer Getting Started document.

New Features

The following new features have been added to this release.

  • CUDA 11.0.3/cuDNN 8.0.3 support added.
  • Added ability to schedule projects for distributed AI remote training.
  • Added load limit refresh rate.
  • Added load limit refresh percentage.
  • Added easy switching between convolution default, convolution optimized for speed and convolution optimized for memory.
  • Optimized convolution forward pass.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs related to visualizing net and model with LoadLimit > 0.
  • Fixed bugs related to last TSNE image disappearing.
  • Fixed bugs related to exporting a project while the project is open.
  • Fixed bug caused when exiting while training Pong.

For other great examples, including beating ATARI Pong, check out our Examples page.

TripletNet now supported using CUDA 11.0.2 and cuDNN 8.0.2!

In our latest release, version 0.11.0.65, we have added support for the TripletNet used for one-shot and k-n shot learning as described in [1][2], and do so with the newly released CUDA 11.0.2 and cuDNN 8.0.2!

The TripletNet employs three parallel networks that each learn: an anchor image, a positive image that matches the label of the anchor, and a negative image that does not match the label of the anchor.  A new Data Sequence Layer feeds the correct sequence of each of these three images into the three parallel networks (anchor, positive and negative).

At the bottom of the network the Triplet Loss Layer calculates the loss to move the positive images toward the anchor image and the negative images away from it.  For details on the loss gradient calculation, see [3].  During the learning process, similar images tend to group together into clusters.  To see this learned separation in action, first add a Debug Layer to the ip2 layer which learns the 4 item embedding of the anchor images.

Adding a Debug Layer

The Debug Layer caches up to 1000 of the most recently learned embeddings that are passed to it during each forward pass through the network.

Next, train the TripletNet for around 2,500 iterations where it should reach around 97% accuracy.  At this point the Debug Layer will have a full cache of 1000 embeddings.

Once trained, right click on the Debug Layer and select ‘Inspect Layer‘ to run the TSNE algorithm on a subset of the stored embeddings.  As shown below, the TSNE algorithm demonstrates a clear separation between each of the learned embeddings for each anchor image label.

TripletNets are very effective at learning a larger dataset, even when you only have a limited number of labeled data items.  For example, the 60,000 training images of MNIST can be learned up to 80% accuracy with only 30 images of each of the 10 classes of hand written characters 0-9.

To demonstrate this, we first create a small sub-set of the MNIST dataset consisting of 30 images per label for both testing and training – a total of 600 images (1% of the MNIST 60,000 training images).  And of the 600 images, only 300 images are used for training (0.5% of the original set) where the remaining 300 are used for testing.

After training the model up to around 80% accuracy, we saved the weights and then replaced the 600 image dataset with the original, full 60,000/10,000 image MNIST dataset.

Next, we ran the ‘Test Many’ function on the original MNIST dataset, using the weights learned from the 600 image MNIST sub-set dataset  and attained an accuracy of 80%, showing that the majority of the full MNIST dataset can be learned with a much smaller training dataset using the TripletNet model!

To try out the TripletNet for yourself, see the TripletNet tutorial which walks through the steps to train MNIST using only 1% of the original MNIST dataset.

New Features

The following new features have been added to this release.

  • CUDA 11.0.2/cuDNN 8.0.2 support added.
  • Added ONNX InceptionV1 model support to the Public Models dialog.
  • Added ability to remove orphaned project files from the database.
  • Added ability to change labels for each item within a dataset.
  • Added new Data Sequence Layer support.
  • Added new Triplet Loss Layer support.
  • Added new Image Import dataset creator.
  • Added new Auto Label to the COPY dataset creator.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bugs in Public Models dialog allowing hyperlink click during download.
  • Fixed bugs caused when creating datasets.
  • Fixed bugs in project import.

For other great examples, including beating ATARI Pong, check out our Examples page.

 


[1] E. Hoffer and N. Ailon, Deep metric learning using Triplet network, arXiv:1412:6622, 2018.

[2] A. Hermans, L. Beyer and B. Liebe, In Defense of the Triplet Loss for Person Re-Identification, arXiv:1703.07737v2, 2017.

[3] Shai, What’s the triplet loss back propagation gradient formula?, StackOverflow, 2015.

And Now for Something Completely Different

In the finance world, ‘open interest‘ as defined by Investopedia is, “the total number of outstanding derivative contracts, such as options or futures.”  And for those of you not familiar with the term, an option is a contract that gives the owner the right to either buy (call option) or sell (put option) a stock at a given price (strike price) on a specific date (the expiration date).

In the AI field, there is not only a thirst for learning better and better AI models but also a need to understand the data on which the AI models run.  Good AI models don’t just give us the answer that we seek, but help us understand the data as a whole thus empowering us (as humans) to do even better at the task at hand.  As an AI researcher you are continually looking for data anomalies that may give you a slightly better signal when trained under the right AI model.

While working with option open interest data we ran across one such anomaly in the near-term open interest data of the GLD (gold), SLV (silver) and UUP (US Dollar) ETF’s.

Before discussing this odd relationship, let’s discuss what led us to it.  To get a better view on the markets price direction ‘bias’ for a given instrument, we create a graph showing the difference between call and put option open interest per expiration which are then tabulated and displayed per strike along the x-axis in a histogram format.  Positive histogram bars represent strikes with more call option open interest than put option open interest, and negative histogram bars represent strikes with the opposite relationship.  Given that the differences are taken per expiration data (for which there are many per strike), each strike may show both positive and negative bars where a balanced (or no) bias would show the same length positive and negative bar on the same strike.

In a simple sense, this graph provides a visual representation of the put/call ratio for the instrument. This graphical analysis can quickly tell you whether the market is more biased toward call options (e.g. more call open interest than put) or put options (e.g. more put open interest than call).  And in our limited view (and this is in no means intended as investment advice), we would interpret a high bias toward calls to mean the market expects the price to rise and a high bias towards puts to mean the market expects the price to fall.

While visualizing the open interest on the UUP (US Dollar) ETF, we found a strong skew toward call options with very little put option open interest.  To us this seemed somewhat intuitive, for there is currently a very large global dollar shortage [1][2][3] which seems to be persisting even after the massive stimulus moves by the FED [4][5].

UUP 6/22/2020 Differential Open Interest (updated)

As shown above, there is a strong bias towards the call options for UUP with most of the call option open interest showing up in the 7/17/2020 expiration.  We interpret this to mean the market is anticipating the value of UUP (and the US Dollar) to rise in value.

According to macroaxis, both GLD and SLV each have a negative correlation to UUP.

GLD, SLV and UUP Correlation

GLD has a –0.54 negative correlation to UUP and

SLV has a -0.78 negative correlation to UUP.

Intuitively, with these negative correlations, we would expect to also see a bias towards the puts in both GLD and SLV.  However, surprisingly, that is not so.

Both the GLD ETF…

GLD 6/22/2020 Differential Open Interest (updated)

…and the SLV ETF…

SLV 6/22/2020 Differential Open Interest (updated)

have VERY strong biases toward the call options with far more call open interest than put open interest observed on both GLD and SLV!

This is not only counter intuitive, but goes against the negative correlations observed.  These are indeed strange and extreme times.  Is the market wrong?

Or, given the extreme environment, are countries who do not have access to the US currency swap lines using the the smaller metals markets as a hedge which ends up driving up both GLD/SLV and UUP at the same time?

Well, we really don’t know, but if you would like to use AI modeling and analytics to get a better understanding of what actually is happing, let us know for we would like to work with you!

For serious inquiries just send us a note on our Contact Us page.

Full Disclosure, from time to time we may hold open positions in GLD, SLV and/or hold USD.  As always, use at your own risk and do your own diligence.


[1] B. W. Setser (Mar 17, 2020). Addressing the Global Dollar Shortage: More Swap Lines? A New Fed Repo Facility for Central Banks? More IMF Lending?. Council on Foreign Relations
[2] C. Anstey and E. Curran (Mar 22, 2020). Dire Dollar Shortage Shows Failure to Fix Key Crisis Flaw. Bloomberg
[3] M. Every (Apr 12, 2020). ‘Down The Rabbit Hole’ – The Eurodollar Market Is The Matrix Behind It All. ZeroHedge
[4] D. Lacalle (Mar 31, 2020). Why the World Has a Dollar Shortage, Despite Massive Fed Action. Mises Institute
[5] D. Lacalle (May 3, 2020). Global US Dollar Shortage Rises as Emerging Markets Lose Reserves. DanielLacalle Site

ONNX AI Model Format now supported by the SignalPop AI Designer!

In our latest release, version 0.10.2.309, we have added support for the ONNX AI model format.  The Open Neural Network Exchange (ONNX) AI model format is a generic AI model format supported by many AI vendors that allows sharing AI models between different AI platforms and tools.  Using the SignalPop AI Designer and  MyCaffe AI Platform you can now easily export MyCaffe models to *.onnx files and import from *.onnx files into MyCaffe models.  Alternatively, you can import just the weights within a *.onnx file into your existing SignalPop AI Designer project. 

Importing ONNX Files Into MyCaffe

Selecting the ‘File | Import‘ menu and selecting the ‘Get Public Models‘ button, displays the newly designed ‘Public Models‘ dialog.

Public Models Dialog

Simply, select the ONNX based model, download and import it into your new project.  When importing, weight blobs that have matching sizes to your model are imported directly.

In some cases, your blob sizing’s may not match, or you may only want to import a few weight blobs, which is typically done when performing transfer learning.

To import a subset of weight blobs or verify the size matches, open the new project, right click on the ‘Accuracy‘ icon and select the ‘Import‘ menu.  Select the *.onnx file who’s weights are to be imported and then press the ‘Load Details‘ button to see the weight blob sizing.

Import Weights Dialog

Check the blobs to import , check to ‘Save‘ checkbox to save them in your model and press ‘OK‘ to import the new weights.

Once imported, double click on the ‘Accuracy‘ icon while the project is open so that you can visualize the weights.  Selecting the ‘Run weight visualization‘ () button visually displays all weight blobs allowing you to verify that they are correct.

For example, the following are the first set of weights from the ‘ResNet50’ model imported from ONNX.

ResNet50 Weights
Exporting MyCaffe Projects to ONNX Files

To export a MyCaffe project to an *.onnx file, right click on the project name and select the ‘Export‘ menu item which displays the ‘Export Project‘ dialog.

Export Project Dialog

Select the ‘ONNX‘ format radio button and press the ‘Export‘ button to export your project into a *.onnx file.

Model Notes

The following should be noted on each of the ONNX models currently supported.

AlexNet – All weights except fc6_1 weights (relies on external sizing) import, however fc8 weights and fc8 bias only import when using the same 1000 outputs as the ONNX model.

GoogleNet – All weights import, however loss3/classifier_1 weights and loss3/classifier_1 bias only import when using the same 1000 outputs as the ONNX model.

VGG16 and VGG19 – All weights except vgg0_dense0_fwd_weights (relies on external sizing) import, however vgg0_dense2_fwd_weighs and vgg0_dense2_fwd_bias only import when using the same 1000 outputs as the ONNX model.

ResNet50 – Only external weights are imported and for this reason, weights should be re-imported with ‘Include internal blobs‘ unchecked.  For example, the ONNX model does not have the global_mean, global_variance and var_correction blobs used by the BatchNorm layer.  When unchecking ‘Include internal blobs‘ all weights are imported, however the resnetv17_dense0_fwd_weights and resnetv17_dense0_fwd_bias are only imported when using the same 1000 outputs as the ONNX model.

InceptionV1All weights import, however loss3/classifier_1 weights and loss3/classifier_1 bias only import when using the same 1000 outputs as the ONNX model.

Programming

Under the hood, the SignalPop AI Designer uses the MyCaffe AI Platform’s new MyCaffeConversionControl to both import from and to *.onnx files.

Importing an *.onnx file is performed with just a few lines of C# code.

Importing an ONNX file

And, exporting is just as easy.

Exporting to ONNX file

To see the code and try it out yourself, see the OnnxExamples project on GitHub.

For other examples that show how to use the MyCaffeConversionControl, see the TestPersistOnnx automatic test functions.

New Features

The following new features have been added to this release.

  • Added ONNX AI Model support for importing *.onnx files to MyCaffe.
  • Added ONNX AI Model support for exporting MyCaffe models to *.onnx files.
  • Added model layer counts to the Model Editor.
  • Improved Weight Import dialog usability.
  • Improved Public Model dialog.
  • Added support for very large models such as ResNet152 and Siamese ResNet152.
  • Added MultiBox support to TestAll.
  • Added ability to run Label Impact on any image file.
  • Upgraded to EntityFramework 6.4.4
  • Upgraded to Google.ProtoBuf 3.12.1
  • Added DISABLED snapshot update method type to disable snapshots on a project.
Bug Fixes
  • Fixed bug that limited very large model sizes.
  • Fixed bug related to saving best training solver state and weights.
  • Fixed bugs related to the ResNet56 model.

To try out training various model types just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.


Large Art Prints Created with The SignalPop AI Designer

The SignalPop AI Designer and the MyCaffe AI Platform‘s implementation of Neural Style Transfer[1] and the VGG model[2] are now used to create large sized artistic prints – some as large as a ping-pong table! Creating these prints can run up to 2 trillion calculations and requires nearly all of the 50GB of video memory offered by the NVIDIA Quadro RTX 8000 GPU.

Trevor Kennison’s epic launch off Corbet’s Couloir, Jackson Hole, WY.

The print of Trevor Kennison was a collaborative effort between former US Ski Team photographer Jonathan Selkowitz, artist Noemí Ibarz of Barcelona, Spain and Signalpop who provided the artificial intelligence software, including both the SignalPop AI Designer and MyCaffe AI Platform.

Photo to Art Collaboration

Artificial intelligence helps bring together the creative talents of both the photographer and artist to create a new, collaborative piece of work!

A closer view of the print shows how the AI actually learns Noemí’s artistic style and paints Jonathan’s picture of Trevor with it to create a fantastic, new piece of art!

Trevor Print Detail #1
Trevor Print Detail #2
Trevor Print Detail #3

The neural style transfer, learns not only the colors of Noemí’s art, but also the brush strokes and even the texture of the medium on which her art was painted.

Art by Noemí Ibarz

Visit Instagram@noemi_ibarz to see more of Noemí’s fantastic colorful art!

Visit Selko Photo to see more of Jonathan’s beautiful photography that does an amazing job of capturing motion in a still image.

In an effort to help the High Fives Foundation (who helped Trevor get his jump back) we are auctioning off this print.


[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, A Neural Algorithm of Artistic Style, 2015, arXiv:1508:06576.

[2] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014, arXiv:1409.1556.

Siamese Net with KNN, Contrastive Loss and easy Python Programming

In our latest release, version 0.10.2.124, we have added support for K-Nearest Neighbor (KNN) to the Siamese Net as described by [1] and [2].

KNN provides the Siamese Net with another way to perform ‘One-Shot’ learning on difficult datasets that include images the network has never seen before.

How Does This Work?

Most of the steps are very similar to what was discussed in the last blog entry on the Siamese Net.  However, when using KNN, the Decode Layer builds a cache in GPU memory of encodings received for each label.

Siamese Net with KNN

During training, the encoding cache is filled, and then during testing and running, the cache is used to find the encodings with the largest number of ‘nearest neighbors’ (e.g. shortest distances) to the current encoding from the data we seek to find a label for.

The following steps take place within the KNN based Siamese Net:

1.) During training, the Data Layer is fed a pair of images where each image is stacked one after the other in the channel.

2.) Internally, the model splits the data into two separate data streams using the Slice Layer.  Each data stream is then fed into each of the two parallel nets making up the Siamese Net.

3.) Within the first network, the Decode Layer caches the encodings that it receives and stores the encodings by label.  The encoding cache is stored within GPU memory and persisted along with the model weights so that the cache can be used later when the net is run.

4.) As with the CENTROID based Siamese Net, the encodings are sent to the Contrastive Loss layer which calculates the loss from the distance between the two encodings received from each of the two parallel nets.

5.) When running the net, the Decode layer compares its input encodings(that we seek to find a label for) with each of the encodings within the label cache.  The distance between each is determined and the distances are sorted for each label and ordered by smallest distance first.  A set of the  k smallest distances for each label are averaged and the label with the smallest average distance is determined to be the detected label for the data.

6.) When running, the label distance averages are returned and the label with the smallest distance is determined to be the label detected.

Programming MyCaffe in Python

Using pythonnet makes programming MyCaffe in Python extremely easy and gives you access to virtually all aspects of MyCaffe.  To get started, just install pythonnet by running the following command from with in your 64-bit Python environment.

pip install pythonnet

Once installed, each MyCaffe namespace is accessed by adding a reference to each namespace via the clr.  For example the following references are used by the One-Shot Python Sample on GitHub.

Python References for One-Shot Learning Sample

These Python references are equivalent to the following references used in the C# sample.

C# References for One-Shot Learning Sample

After referencing the MyCaffe DLLs, each object within the referenced DLLs are just about as easy to use in Python as they are in C#!  For more discussion on using Python to program MyCaffe, see the section on Training and Testing with Python in the MyCaffe Programming Guide.

For an example on programming the MyCaffeControl with C# to learn the MNIST dataset using a Siamese Net with KNN, see the C# Siamese Net Sample on GitHub.

For an example on programming the MyCaffeControl with Python to learn the MNIST dataset using a Siamese Net with KNN, see the Python Siamese Net Sample on GitHub.

New Features

The following new features have been added to this release.

  • Added MyCaffe Image Database version 2 support with faster background loading and new query state support.
  • Added VerifyCompute to verify that the current device matches the compute requirement of the currently used CudaDnnDll.
  • Added multi-label support to Debug Layer inspection.
  • Added boost query hit percentage support to project properties.
  • Added new KNN support the Decode layer for use with Siamese Nets.
  • Added new AccuracyDecode layer for use with Siamese Nets.
  • Added boost persistence support for saving and loading image boost settings.
Bug Fixes

The following bugs have been fixed in this release.

  • Project results are now properly displayed after importing into a new project.
  • Project results are now deletable after closing a project.
  • Data loading optimized for faster image loading.
  • Fixed bugs in MyCaffe Image Database related to superboost probability and label balancing.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.


[1] Berkeley Artificial Intelligence (BAIR), Siamese Network Training with Caffe.

[2] G. Koch, R. Zemel and R. Salakhutdinov, Siamese Neural Networks for One-shot Image Recognition, ICML 2015 Deep Learning Workshop, 2015.

Siamese Net with Contrastive Loss now supported with CUDA 10.2 and cuDNN 7.6.5!

In our latest release, version 0.10.2.38, we have added support for Siamese Nets as described by [1] and [2], and do so with the newly released CUDA 10.2 and cuDNN 7.6.5!

The Siamese Net provides the ability to perform ‘One-Shot’ learning where an image that the network has never seen before, is quickly matched with already learned classifications – if such a match exists. Several examples using Siamese Nets for one-shot learning include: Image retrieval described by [3]; Content based retrieval described by [4]; and Railway asset detection described by [5].

How Does This Work?

Siamese Nets use two parallel networks that both share the same weights which are learned while digesting pairs of images.  The following sequence of steps occur while training a Siamese Net.

How the Siamese Net works

1.) Using the Data Layer, data is fed into the net as image pairs where each image is stacked one after the other along the data input channels. In order to provide a balanced training, pairs of images alternate between two images of the same class followed by two images of different classes.

2.) During training, the pairs of images loaded by the Data Layer are split by the Slice Layer which then feeds each image into one of two parallel networks that both share a set of learnable weights.

3.) Each network produces an encoding for each image fed to it.

4.) These encodings are then sent to the Contrastive Loss layer which calculates the loss from the distance between the two encodings where the loss is set to the squared distance between the two images when they are from the same class, and the squared difference between the margin and the distance when they are from different classes; which moves the image encodings toward one another when they are from the same class and further apart when they are not.

5.) During training, a Decode Layer calculates and stores the centroid encoding for each class.

6.) When running the network, the Decode Layer’s stored centroids are used to determine the shortest distance between the input image’s encoding and each classes encoding centroid. A minimum distance between the input image and a given class indicates that the input image matches the class.

In summary, training a Siamese Net directs encodings of similar classes to group together and encodings of different classes to move away from one another.  When running the Siamese Net, the input image is converted into its encoding which is then matched with the class for which it is closest.  In the event the encoding is not ‘close’ to any other class encoding centroids, the image is determined to be a part of an unknown, or new class.

Siamese Net to learn MNIST

The following Siamese Net is used to learn the MNIST dataset.

Siamese Net Model

As shown above, the model uses two parallel networks joined at the bottom by a Contrastive Loss layer.  Pairs of images are fed into the top of each network which then each produce the ‘feat‘ and ‘feat_p‘ encodings for the images.  These encoding pairs are then sent to the Contrastive Loss layer where the learning begins.

New Features

The following new features have been added to this release.

  • Added CUDA 10.2.89/cuDNN7.6.5.32 support.
  • Added label query hit percent and label query epoch information to projects.
  • Added new AccuracyEncoding layer for Siamese Nets.
  • Added new Decode layer for Siamese Nets.
  • Added new multiple image support to Data Layer for Siamese Nets.
  • Added new layer weight map visualization.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug related to VOC0712 Dataset Creator.
  • Fixed bug related to data overlaps occurring within Data layer.
  • Optimized cuDNN handle usage.
  • Optimized pinned memory usage.

To try out this model and train it yourself, just check out our Tutorials for easy step-by-step instructions that will get you started quickly! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.


[1] Berkeley Artificial Intelligence (BAIR), Siamese Network Training with Caffe.

[2] G. Koch, R. Zemel and R. Salakhutdinov, Siamese Neural Networks for One-shot Image Recognition, ICML 2015 Deep Learning Workshop, 2015.

[3] K. L. Wiggers, A. S. Britto, L. Heutte, A. L. Koerich and L. S. Oliveira, Image Retrieval and Pattern Spotting using Siamese Neural Network, arXiv, vol. 1906.09513, 2019.

[4] Y.-A. Chung and W.-H. Weng, Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval, arXiv, vol. 1711.08490, 2017.

[5] D. J. Rao, S. Mittal and S. Ritika, Siamese Neural Networks for One-shot detection of Railway Track Switches, arXiv, vol. 1712.08036, 2017.

Maintenance Release #2 with CUDA 10.1.243/cuDNN 7.6.4 Support.

In our latest maintenance release, version 0.10.1.283, we have fixed numerous bugs and support the newly released NVIDIA CUDA 10.1.243 with cuDNN 7.6.4. You can now use Neural Style Transfer, Policy Gradient based reinforcement learning, Char-RNN LSTM based learning and much more with CUDA 10.1.243.

New Features

The following new features have been added to this release.

  • Added new Permute Layer for SSD.
  • Added new PriorBox Layer for SSD.
  • Added new AnnotatedData Layer for SSD.
  • Added new Normalization2 Layer for SSD.
  • Added new MultiBoxLoss Layer for SSD.
  • Added new DetectionEvaluate Layer for SSD.
  • Added new DetectionOutput Layer for SSD.
  • Added new VOC0712 Dataset Creator.
  • Added label query hit percent and label query epoch information to projects.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug related to flatten layer reporting incorrect sizing in model editor.
  • Fixed bug related to Split layer causing crash in model editor (e.g. DANN model).
  • Fixed bug in dataset export.
  • Fixed bug in image viewer – the viewer now resizes images to the window.

Check out our Tutorials for easy step-by-step instructions that will get you started quickly with several complex model types! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.

Maintenance Release with CUDA 10.1.243/cuDNN 7.6.3 Support.

In our latest maintenance release, version 0.10.1.221, we have fixed numerous bugs and support the newly released NVIDIA CUDA 10.1.243 with cuDNN 7.6.3. You can now use Neural Style Transfer, Policy Gradient based reinforcement learning, Char-RNN LSTM based learning and much more with CUDA 10.1.243.

New Features

The following new features have been added to this release.

  • Added legacy compute 3.5 support to CUDA 10.1 version of CudaDnn.DLL.
  • Added optional use of Image Database Service when performing TSNE analysis.
  • Added option to resize input images to TSNE analysis.
  • Added custom input image masking to TSNE analysis.
  • Added Find Layer to Model Editor.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug related to process warning, now properly handled when closing application.
  • Fixed bug related to project results not showing, now results are shown based on the snapshot load method.
  • Fixed bug related to TSNE and PCA analysis creating duplicate outputs.

Check out our Tutorials for easy step-by-step instructions that will get you started quickly with several complex model types! For cool example videos, including an ATARI Pong video and Cart-Pole balancing video, check out our Examples page.