New Release with New Samples

In our latest release, version 1.11.7.7, we showcase several new loss samples that demonstrate binary classification, multi-class classification, multi-label classification and regression with the new MSE and MAE layers – all using the latest NVIDIA CUDA 11.7.1 / cuDNN 8.4.1 release.

Binary Classification

The binary classification sample solves a simple 2-class classification problem, where the model learns to determine whether a given point falls within one of two circles of dots.

Binary Classification

For more on this sample, see the Binary Classification Loss sample on GitHub.

Multi-Class Classification

The multi-classification sample solves a simple 3-class classification problem where the model learns to determine whether a given point falls within one of three blobs of dots.

Multi-class Classification

For more on this sample, see the Multi-class Classification Loss sample on GitHub.

Multi-Label Classification

The multi-label classification sample solves problems where the model learns one or more labels per input.  In this sample, the model learns one or more of 7 characteristics describing each handwritten character in the MNIST dataset.

Multi-label Classification

For more on this sample, see the Multi-label Classification Loss sample on GitHub.

Regression

Regression problems train models to find certain values.  MyCaffe solves regression problems using Mean Squared Error (MSE) or Mean Absolute Error (MAE).

Mean Error Loss Regression

The Mean Error Loss layer is used to solve regression problems with the type set to MSE for Mean Squared Error, or MAE for Mean Absolute Error.

For more on this sample, see the Mean Error Loss Sample on GitHub.

New Features

The following new features have been added to this release.

  • CUDA 11.7.1.516/cuDNN 8.4.1.50/nvapi 510/driver 516.40/516.59
  • Windows 11 21H2
  • Windows 10 21H2, OS Build 19044.1865, SDK 10.0.19041.0
  • Added HINGE_LOSS layer support.
  • Added MEAN_ERROR_LOSS layer support.
  • Added support for dataset label recommendation.
  • Improved object detection dataset building.
  • Improved overall processing throughput when using multi-threaded operations.
  • Upgraded to GoogleProtobuf 3.21.4.
Bug Fixes

The following bug fixes have been added to this release.

  • Fixed bug in Impact Map.
  • Fixed bug in MAELossLayer where normalization was incorrect.
  • Fixed bug in Solver where ‘display’ was not being used.

For other great examples, including using Single Shot Multi-Box to detect gas leaks, or using Neural Style Transfer to create innovative and unique art, or creating Shakespeare sonnets with a CharNet, or beating PONG with Reinforcement Learning, check out the Examples page.

Happy Deep Learning with MyCaffe!

Using MyCaffe AI Platform in real-time inferencing

The SignalPop Trading Studio is Windows Store App that provides short-term option traders with real-time analytics geared to help the trader better understand what the market is doing during each intra-day trading session.  Part of the analytics provided by the SignalPop Trading Studio include real-time, AI driven price directional predictions which when taken together give a better idea of the current market momentum.

Each trading day, the application often processes over 50 million quotes, depending on the market activity, and runs over 40 thousand data points through the AI models using the MyCaffe AI Platform.  When running on a high-end GPU, such as the NVIDIA RTX A6000, these 40 thousand data points can be processed in under 100 milliseconds whereas on a low-cost GPU, such as the NVIDIA 1050 TI this processing is completed in around 250 milliseconds.  Each trading day, the MyCaffe AI Platform processes over 1 billion data points.

This high-level design document shows the data flow that pulses throughout the application on a 1-2 second scan cycle which allows the many analytic aspects of the product to produce real-time results that are then displayed on the price charts used when trading.

System Overview

The data firehose enters the application from the data vendor data stream and flows through the application like a river.  Vendor API calls are used to collect more pinpointed data and execute trades when desired.

SignalPop Trading Studio Architecture

Other than the main application and the charting that it provides, there are three main subsystems used by the application: Data Input, Extensions and Trading Execution.

The following sections describe the responsibilities and steps taken by each of these sub-systems as the application operates.

Data Input

All data fed into the system is first sent to the Data Condenser which is responsible for organizing all data into time synchronized time-periods used by the application.

For example, a vendor’s data stream data is (1) continually received asynchronously by the Data Provider module for that vendor and (2) sent to the Data Condenser which organizes the data by symbol and time-period.  The synchronized data is then sent asynchronously to (3) the charts for display, and (4) the extensions for further processing.

Data Provider modules are plug-n-play where new data providers are easily supported if they have a streaming and API based capabilities that provide standard market data formats.  A special Data Provider called the SQL Data Provider is used by our team internally for back-testing, and model development.

Extensions

Extensions are plug-n-play modules that provide special functionality to the application.  Each plug-in is given access to the user interface and active chart where the plug-in is free to draw and display information related to the plug-in.  Time synchronized data packets flow to each plug-in on each data cycle and calculations made by one plug-in are passed to the next.

The AI based plug-ins AI Momentum Plug-in (5) and AI Trend Plug-in (7) both use the MyCaffe AI Platform (6)(8) to provide real-time insights based on trained AI models.  For example, the AI Momentum Plug-in (5) uses the MyCaffe AI Platform to detect the current market direction and predict the likely short-term future market direction for five different short term time periods and displays this information in real-time in the AI Momentum display.

The AI Trend Plug-in (7) is currently experimental and uses AI to detect the current trend direction and strength.  We are currently working on expanding the functionality of this plug-in.

The Drawing Plug-in (9) allows users to draw trend, support and resistance lines on the current chart so that traders can map out the structure of the market.

The Indicators Plug-in (10) gives traders a more in-depth view of historical changes in buy/sell pressure accumulation, correlation, and strength.

Trade Execution

The Trading Plug-in (11) gives traders real-time quote data on the options of the underlying and the underlying itself and allows traders to execute trades which are then displayed on the current price chart.

All trade executions (14) are performed by the Trading Vendor used.  The Trade Engine (12) manages the process of directing the trading vendor associated with the Trade Provider module for that vendor, to execute the trade as directed by the user.

Auto trading (15) is currently experimental and only used internally by SignalPop to test the viability of one model or another before it is published in the application as part of the signals produced by the AI Trend Extension Plug-in.  The Auto trading module (16) uses the MyCaffe AI Platform to help make better and more profitable trading decisions.

Summary

Using an AI platform in real-world, real-time inference-based applications has several unique requirements for the platform must run fast yet produce reliable results.  The following suggestions may help meet these objectives.

  • Use a GPU; the modern GPU’s, such as those produced by NVIDIA are fast, reliable, and surprisingly low cost on the low end if your model fits on a GPU that has 4 GB of video memory or less. This is important for crypto currency mining typically requires 6 GB or more which can dramatically increase the cost of the GPU itself.  The SignalPop Trading Studio processes 40 thousand data points in around 250 milliseconds on an NVIDIA 1050TI which only has 4 GB of video memory and can be purchased for under $300.
  • Use the same Data Forming during Training and Inferencing; when training it is critical to use the same data forming software both during training and during real-time inferencing. This may seem obvious but subtle differences in time synchronization and even data calculations can cause very different results when running in real-time inferencing.
  • Always test on different data from different time frames; When testing, it is critical to only use data that is completely unseen during the training process. For example, if a given data input used during training contains a 10-period window of data, the testing data should not contain any of the data points that fall within the same data window.  Again, this may seem obvious but can give misleading results if not followed.
  • Focus on Detection, then slowly move into Prediction; Detection is a far easier problem to solve that prediction – so start there first. Once the data patters you seek to find are easily detected, then and only then try to see how predictive they are.  If you can’t detect the pattern, you most certainly will not be able to predict what happens after it.

In this post, we showed how an application can use AI to help empower traders with better insights that are produced by analyzing vast amounts of data in real-time.  Gone are the days where a trader needs 9 to 12 screens to analyze market direction – AI has the power to process all of that data and more in real-time and produce more consistent results than even the most seasoned trader.

Happy Deep Learning!

minGPT – How It Works

minGPT, created by Andrej Karpathy, is a simplified implementation of the original OpenAI GPT-2 open-source project.

GPT has proven very useful in solving many Natural Language Processing problems (NLP) and as shown by Karpathy and others, also used to solve tasks outside of the NLP domain such as generative image processing and classification.

One of the samples created by Karpathy uses minGPT to learn how to model and create Shakespeare sonnets similar to a CharRnn.  Chunks of text are randomly extracted from the input, converted into ASCII numbers and fed into the model.

minGPT input

Batches of randomly selected input ‘chunks’ are fed into the minGPT model by first converting each input into an embedding, then feeding the embeddings through a set of 8 Transformer Blocks whose results are then decoded by the decoder head.

minGPT Model

Each Transformer Block normalizes the inputs before sending it to the CasualSelfAttention layer and then on to the MLP layers.

During training, the logits produced by the Decoder head are sent along with the targets to a CrossEntropyLoss layer to produce the overall loss.

minGPT training

Each training cycle uses a custom DataLoader to load random chunks from the input text file.  These input chunks are fed into the GPT model which produces the loss.  Next, the model gradients are zeroed, and the loss back propagates back through the model.  Gradients are clipped and the optimizer step applies the gradients to the weights.  And finally, the learning rate is decayed based on the training progress.

Other examples provided by Karpathy, show how to use the same GPT model architecture to generate images in the CIFAR-10 dataset.  This is accomplished by converting a subset of pixels from each image into a stream of numbers that are then fed into the GPT model much in the same way as character-based solution described above.

To see the full presentation that describes the data flow and CasualSelfAttention layer in detail, see the minGPT – How It Works presentation.

Happy Deep Learning!

Three Big Version 1.0 Releases!

The MyCaffe AI Platform, SignalPop AI Designer and new SignalPop Trading Studio all release as 1.+ versions!

All of our products use the MyCaffe AI Platform to provide fast AI inferencing solutions on low-cost NVIDIA GPUs, some of these GPUs can be purchased for under $250 yet still run AI inferencing loads very quickly!

For training, the SignalPop AI Designer allows developers to develop, train and debug complicated AI models to solve a wide arrange of problems using various AI strategies including: Classification, Classification with small datasets, Reinforcement Learning, Recurrent Learning, Neural Style Transfer, and Sequence-2-Sequence.  For more information, please see the SignalPop AI Designer product information.

Our new SignalPop Trading Studio uses models developed with the SignalPop AI Designer and runs them with the MyCaffe AI Platform to predict short term price movements in the equities market with over 80% accuracy.

SignalPop Trading Studio AI Momentum

AI price predictions are created for 1, 2 and 3 periods into the future for the 10-second, 20-second, 30-second, 1-minute and 5-minute intervals giving traders a clear view on where the market is currently moving.  Combined these predictions run in under 100 milliseconds on some GPUs.  These predictions are significant for, unlike statistical measurements, the AI predictions have no time lag which can dramatically improve a trader’s view on what the market is doing in real-time.

Each trading day, the SignalPop Trading Studio uses the MyCaffe AI Platform to process over one billion data points – and can do so on a laptop computer running Microsoft Windows 10 or 11.

The SignalPop Trading Studio is available both on the Microsoft Store and on a fast CDN SignalPop download.  For more information, please see the SignalPop Trading Studio product information.

New Features

The following new features have been added to this release.

  • CUDA 11.6.2.511/cuDNN 8.4.0.27/nvapi 510/driver 511.79/512.95
  • Windows 11 21H2
  • Windows 10 21H2, OS Build 19044.1706, SDK 10.0.19041.0
  • Upgraded Google.Protobuf to 3.21.1
  • Upgraded System.Memory to 4.5.5
  • Updated ONNX public model links.
  • Added support for importing weights in CaffeModel.H5 file format.
  • Added threshold support to TestAll.
  • Added new web-browser support.
Bug Fixes

The following bug fixes have been added to this release.

  • Fixed bug where visualizing weights would error in some instances.
  • Fixed bug in multi-gpu training caused when checking for monitor.
  • Fixed bug in chatbot model where Dialog notes no testing images.
  • Fixed bug in chatbot where input errors do not reenable run button.
  • Fixed bug caused on close during MachineRegistry shutdown.

For other great examples, including, using Single Shot Multi-Box to detect gas leaks, using Neural Style Transfer to create innovative and unique art, creating new Shakespeare sonnets, and beating ATARI Pong with Reinforcement Learning, check out our Examples page.

Happy Deep Learning with the SignalPop AI Designer and MyCaffe AI Platform!

Lots of Upgrades! Visual Studio 2022, .NET 4.8, and CUDA 11.6 with cuDNN 8.3.2!

In our latest release, version 0.11.6.86, we have made a lot of upgrades including now supporting both Windows 10 and Windows 11 with Visual Studio 2022 and the latest CUDA 11.6 and cuDNN 8.3.2 from NVIDIA.

New Features

The following new features have been added to this release.

  • CUDA 11.6.0.511/cuDNN 8.3.2.44/nvapi 510/driver 511.65
  • Windows 11 21H1, OS Build 22000.493
  • Windows 10 21H1, OS Build 19043.1320, SDK 10.0.19041.0
  • Upgraded to Visual Studio 2022 Builds
  • Upgraded to .NET 4.8
  • Improved error messages when clicking on error in output window.
  • Added Label Image comparison.
  • Added save image-to-image viewers.
  • Improved global error handling.
  • Added SERF activation layer.
  • Training dataset used for testing, when no testing images exist.
Bug Fixes
  • Fixed bug when canceling from Create Test Results not enabling controls.
  • Fixed radio button bug on LabelBoost dialog.
  • Fixed bug where use_mean_image was not exported.
  • Fixed bugs related to NVAPI use for temperature and utilization when in TCC mode.
  • Fixed lockup caused at times when clearing status.
  • Fixed bug RESIZE doesn’t restart after being cancelled.
  • Fixed bug where RESIZE allows creating duplicate data sources.
  • Fixed bugs related to GPU resource status.
  • Fixed crash caused when running Extension functions.
  • Fixed bugs in low level memory freeing, added synchronization.

Happy deep learning with Visual Studio 2022!

Using MyCaffe to mine the EDGAR Database

The US Securities and Exchange Commission’s EDGAR database contains the public filings of public US companies, including quarterly (10Q) and annual (10K) filings as well as 13F filings that list the positions held by investment-based companies at the time of each filing.

Using the MyCaffe AI Platform to analyze each of these filings, we were able to predict when a fund increased their positions with a 70% accuracy.

Reversing AI Model to discover correlations

Reversing these models revealed which data items within each filing had higher correlations with the decisions made by fund.

For example, the circled hot spot shown in the ALL 1 (buy) image (right side above) represents Lease information that we presume relates to lease information from Oil and Gas company 10Q/10K filings.  As shown above, the corresponding item selected on the Excel spreadsheet relates to LesseeOperatingLeaseLiabilityPaymentDue which according to the EDGAR database is defined as:

LesseeOperatingLeaseLiabilityPaymentDue – Amount of lessee’s undiscounted obligation for lease payment for operating lease.

Each Excel spreadsheet contains thousands of similar items that show how strongly each data value contributes to the firing of a given label.

The above image hot-spot mappings were created showing how buy vs sell/hold decisions were made for all equities.  However, such mappings can be created to focus on a company, sector, or industry, which may provide meaningful insights on how the objective data relates to investment decisions made.

The following hot-spot map shows the data points that triggered the sell/hold label (learned using nearly 40,000 position changes over 10 years).

Sell/Hold Hot Spots

Several notable hot-spot items impacting the decision to sell/hold a position are:

PaymentsToAcquireLoansReceivable
LineOfCreditFacilityInterestRateDuringPeriod
IncreaseDecreaseInDerivativeLiabilities
CapitalizedCostsOfUnprovedPropertiesExcludedFromAmortization
DebtSecuritiesAvailableForSaleRealizedGainLoss

The following hot-spot map shows the data points that triggered the buy label (learned using nearly 40,000 position changes over 10 years).

Buy Hot Spots

Several notable hot-spot items impacting the decision to increase a position are:

IncreaseDecreaseInRiskManagementAssetsAndLiabilities
WeightedAverageNumberOfSharesContingentlyIssuable
NumberOfRestaurants
SharebasedCompensationArrangementBySharebasedPaymentAwardOptionsNonvestedNumberOfShares
LesseeOperatingLeaseLiabilityPaymentsDueYearFour

If you would like to learn more about analyzing the EDGAR database with AI, see the AI Analysis of EDGAR report.

Improved Sequence-to-Sequence with Attention added to SignalPop AI Designer

In our latest release, version 0.11.4.60, we have improved and expanded our support for Sequence-to-Sequence[1] (Seq2Seq) models with Attention[2][3], and do so with the newly released CUDA 11.4.2/cuDNN 8.2.4 from NVIDIA.

Seq2Seq Chat-bot Model

Using its drag-n-drop, visual editor, the SignalPop AI Designer now directly supports building Seq2Seq models like the one shown above.  A new TEXT_DATA input layer provides easy text input management and feeds the data sequences to the model during training and testing creating a powerful visual environment for Seq2Seq!

With this release, we have also released a new Seq2Seq sample that builds on and improves the original Chat-bot sample posted during our last release.

Seq2SeqChatBot2 – in this sample, a Seq2Seq encoder/decoder model with attention is used to learn the question/response patterns of a chat-bot that allow the user to have a conversation with the Chat-bot.  The model learns embeddings for the input data that are then encoded with two LSTM layers and then fed into the LSTMAttention layer along with the encoded decoder input to produce the output sequence.

To try out the Seq2Seq model yourself, check out the new Seq2Seq Chat-bot Tutorial.

New Features

The following new features have been added to this release.

  • CUDA 11.4.2.471/cuDNN 8.2.4.15/nvapi 470/driver 471.96
  • Windows 21H1, OS Build 19043.1202, SDK 10.0.19041.0
  • Added new TextData Layer support.
  • Added new Seq2Seq model support.
  • Added Seq2Seq model templates.
  • Added support for MODEL dataset type.
  • Enhanced weight visualization to show all weights.
  • Added layer inspection for EMBED layers.
  • Added layer inspection for LSTM layers.
  • Added layer inspection for LSTM_ATTENTION layers.
  • Added layer inspection for INNERPRODUCT layers.
  • Added dataset coverage analysis visualization.
  • Added optional load method for project exports.
Bug Fixes
  • Fixed bug in debug layer where data items were less than seed.
  • Fixed bug in error handling on Results window.
  • Fixed bug in network visualization blob title alignments.
  • Fixed bug in IMPORT.VID where image creation would stall.
  • Fixed bug caused when loading Getting Started document.
  • Fixed bug in TestMany where one class was always triggered.
  • Fixed bug in CreateResults where classes were not correct.
  • Fixed bug in TestMany when used with MULTIBOX types.
Known Issues

The following are known issues in this release.

  • Exporting projects to ONNX and re-importing has a known issue. To work around this issue, the weights of an ONNX model can be imported directly to a similar model.
  • Loading and saving LSTMAttention models using the MyCaffe weights has known issues.  Instead, the learnable_blobs of the model can be loaded and saved directly.

Happy deep learning with attention!


[1] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, Sequence to Sequence Learning with Neural Networks, 2014, arXiv:1409.3215.

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention Is All You Need, 2017, arXiv:1706:03762.

[3] Jay Alammar, The Illustrated Transformer, 2017-2020, Jay Alammar Blog.

Sequence-to-Sequence with Attention

In our latest release, version 0.11.3.25, we have added support for Sequence-to-Sequence[1] (Seq2Seq) models with Attention[2][3], and do so with the newly released CUDA 11.3/cuDNN 8.2 from NVIDIA.  Seq2Seq models solve many difficult problems such as language translation, chat-bots, search and time-series prediction.

Seq2Seq model with Attention

The Seq2Seq model is made up of an Encoder (left side) that is linked to the Decoder (right side) by the Attention layer which essentially learns to map the encoder input to the decoder output.  During the model processing, an embedding is learned for the encoder and decoder inputs.  An encoder embedding is produced for both the encoder input and its reverse representation.  These two embeddings are then fed into two LSTM layers that learn the encodings for each which are then concatenated together to produce the encoding inputs that eventually are fed to the Attention layer within the LSTMAttention layer.  An embedding is also learned for the decoder inputs which are then fed to the LSTMAttention layer as well.

Within the LSTMAttention layer, the encoder encodings and last state from the LSTM Attention layer are used to produce the context for the encoding inputs.  This context is then added to the LSTM cell state to produce the decoded LSTM outputs which are then run through an internal inner product and eventual softmax output.  The softmax output is then used to determine the most likely word index produced which is then converted back to the word using the index-to-word mapping of the overall vocabulary.  The resulting cell state is then fed back into the attention layer to produce the next context used when running the decoding LSTM on the next decoder input.

During training, the decoder input starts with the start of sequence token (e.g. ‘1’) and is followed by target0, then target1, and so on until all expected targets are processed in a teacher forcing manner.

Once training completes, the model is run by first feeding the input data through the model along with a decoder start of sequence (e.g. ‘1’) token and then the decoder half of the model is run by feeding the resulting output token back into the decoder input and continuing until an end-of-sequence token is produced.   The string of word tokens produced are each converted back to their corresponding words and output as the result.

This powerful model essentially allows for learning how to map the probability distribution of one dataset to that of another.

With this release, we have also released three sample applications that use the LSTM models in three different applications.

SinCurve – In the first sample, we use the LSTM model to learn how to produce a Sin curve by training the model with teacher forcing on the Sin curve data where the previous data predicts the next data in the curve.

ImagetoSign – in the next sample, we use the LSTM model to learn to match hand drawn character images from the MNIST dataset, shown in sequence, to then draw different portions of the Sin curve.  The second to last inner product data from the MNIST model is input into the LSTM model which then learns to produce a segment of the Sin curve based on the hand written character detected.

Seq2SeqChatBot – in this sample, a Seq2Seq encoder/decoder model with attention is used to learn the question/response patterns of a chat-bot that allow the user to have a conversation with the Chat-bot.  The model learns embeddings for the input data that are then encoded with two LSTM layers and then fed into the LSTMAttention layer along with the encoded decoder input to produce the output sequence.

New Features

The following new features have been added to this release.

  • Added support for CUDA 11.3 and cuDNN 8.2 with NVAPI 650.
  • Tested on Windows 20H2, OS Build 19042.985, SDK 10.0.19041.0
  • Added ability to TestMany after a specified time.
  • Added signal vs. signal average comparison to Model Impact Map.
  • Added color separation to Model Impact Map.
  • Added support for visualizing custom stages from solver.prototxt.
  • Added new MISH layer support.
  • Added new HDF5_Data layer support.
  • Added new MAE Loss layer support.
  • Added new COPY layer support.
  • Added new LSTMAttention layer support.
  • Added new Seq2Seq model support with Attention.
Bug Fixes
  • Fixed bug in RNN learning where weights are now loaded correctly.
  • Fixed bugs related to accuracy calculation in RNN’s.
  • Fixed bug causing UI to lockup when multiple tasks were run.
  • Fixed bug in Copy Dataset creator where start and end dates are now used.
  • Fixed crash occurring after out of memory errors on application exit.
  • Fixed crash caused when importing weights with missing import directory.
  • Fixed bug where InnerProduct transpose parameter was not parsed correctly.
  • Fixed bug in concat_dim, now set to uint?, and ignored when null.
  • Improved physical database query times.
Known Issues

The following are known issues in this release.

  • Exporting projects to ONNX and re-importing has a known issue. To work around this issue, the weights of an ONNX model can be imported directly to a similar model.
  • Loading and saving LSTMAttention models using the MyCaffe weights has known issues.  Instead, the learnable_blobs of the model can be loaded and saved directly.

Happy deep learning with attention!


[1] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, Sequence to Sequence Learning with Neural Networks, 2014, arXiv:1409.3215.

[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention Is All You Need, 2017, arXiv:1706:03762.

[3] Jay Alammar, The Illustrated Transformer, 2017-2020, Jay Alammar Blog.

Debugging complex AI Solutions

In our latest release, version 0.11.2.9, we have added a new powerful debugging technique that visually shows the  areas within an image that impact the firing of each label, and do so with the newly released CUDA 11.2/cuDNN 8.1 from NVIDIA.

Model Impact Visualization

The example above shows the areas within the CIFAR-10 [1] dataset images that actually have the most impact on firing each detected label.

Unlike the label impact visualization, originally inspired by [2], the model impact visualization shows the impact of each area of the image space on all labels, whereas the label impact shows the impact on a single label.

Label Visualization of ‘Automobile’ label sample

To learn more about these debugging techniques, see the new ‘Debugging AI Solutions‘ tutorial.

New Features

The following new features have been added to this release.

  • Added support for CUDA 11.2 / cuDNN 8.1.
  • Added new Model Impact Visualization.
  • Added new Model Reverse Visualization.
  • Added new Transpose Layer.
  • Added new Gather Layer.
  • Added new Constant Layer.
  • Added ONNX InceptionV2 to public models.
  • Added support for compute/sm 3.5 through 8.0.
  • Added ability to reset weights in any layer.
  • Added ability to freeze/unfreeze learning in any layer.
  • Optimized project status updates.
  • Optimized image loading.
Bug Fixes

The following bugs have been fixed in this release.

  • Fixed bug related to scheduling solvers with file data.
  • Fixed bug related to scheduling Char-RNN projects.
  • Fixed bug related to reinforcement learning rewards display.
  • Fixed bug in model editor where nodes exceeded 32k pixel limit.
  • Fixed bug related to renaming data input causing failures.

For other great examples, including, Neural Style Transfer, beating ATARI Pong and creating new Shakespeare sonnets, check out our Examples page.


[1] Alex Krizhevsky, The CIFAR-10 dataset.

[2] Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks, 2013, arXiv:1311.2901.

Battle of the Bits – FPGA vs. GPU

Over the past ten years, Graphic Processing Units (GPUs) have dominated the hardware solution space for artificial intelligence.  Currently, these GPU’s are a steady staple in just about every Datacenter offering.  NVIDIA and AMD, two of the largest GPU manufacturers, have ridden this wave well and increased their profits and stock prices dramatically.

However, there appears to be a new challenger on the horizon – the Field Programmable Gate Array (FPGA), which according to some, may be an even better hardware platform for AI than the GPU.

“Artificial intelligence (AI) is evolving rapidly, with new neural network models, techniques, and use cases emerging regularly. While there is no single architecture that works best for all machine and deep learning applications, FPGAs can offer distinct advantages over GPUs and other types of hardware in certain use cases” Intel [1].

“Achronix’s high-performance FPGAs, combined with GDDR6 memory, are the industry’s highest-bandwidth memory solution for accelerating machine learning workloads in data center and automotive applications. This new joint solution addresses many of the inherent challenges in deep neural networks, including storing large data sets, weight parameters and activations in memory. The underlying hardware needs to store, process and rapidly move data between the processor and memory. In addition, it needs to be programmable to allow more efficient implementations for constantly changing machine learning algorithms. Achronix’s next-generation FPGAs have been optimized to process machine learning workloads and currently are the only FPGAs that offer support for GDDR6 memory” Micron Technology [2].

“Today silicon devices (ex: FPGA / SOC / ASIC) with latest process technology node, able to accommodate massive computing elements, memories, math function with increased memory and interface bandwidth, with smaller-footprint and low power. So having a different AI accelerator topology will certainly have advantage like Responsive, Better security, Flexibility, Performance/Watts and Adoptable. This helps in deploying different network models addressing various application and use case scenarios, by having scalable artificial intelligence accelerator for network inference which eventually enable fast prototyping and customization during the deployment” HCL Technologies [3].

In addition, large acquisitions taking place recently speak to this emerging trend from GPU to FPGA.

As shown above, when the two largest GPU manufacturers (NVIDIA and AMD) make large acquisitions of large, established FPGA companies, change is in the wind.

Why Use an FPGA vs. GPU?

FPGA’s “work similarly to GPUs and their threads in CUDA” Ashwin Sing [4].  According to Sing, several benefits of using an FPGA over a GPU include: Lower power consumption, accepted in safety-critical operations, and support for custom data types all of which are ideal for embedded applications used in edge devices such as automatic driving cars.

In addition, the FPGA appears to have one very large and growing advantage over GPU solutions – Memory.  As of this writing NVIDIA recently released its 3090 GPU with a whopping 24GB of memory which is a great step for AI model designers given its low sub $1500 price point.  However, the amount of memory available to the GPU (or FPGA) directly translates into faster training times for large models which then push the demand for more memory on the edge devices doing the inferencing.  With larger amounts of memory, the training process can use larger input image sizes along with larger batches of images during training thus increasing the overall trained image/second throughput.  Larger images lead to higher image resolution which then leads to higher training accuracies.

Memory chip specialists like Micron Technologies argue “that hardware accelerators linked to higher memory densities are the best way to accelerate ‘memory bound’ AI applications” [5].  By combining a Xilinx Virtex Ultrascale+ FPGA with up to a massive 512GB of DDR4 memory, Micron is clearly demonstrating a large advantage FPGA’s appear to have over GPU’s.

FPGA AI Challenges

Currently, programming the FPGA for machine learning is complex and difficult for “the requirement for laying out and creating hardware is a large barrier to the use of FPGAs in deep learning” [4].  However specialized compilers such as the one provided by Halide, may be changing this.  In 2016, Li et al. proposed “an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput” [6].  An idea further extended by Yang et al. in 2018, “Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators” [7].  According to Yang et al., their FPGA and ASIC back ends to Halide were able to “achieve similar GOPs and DSP utilization when compared against manually optimized designs.”

Compilers do help reduce the complexity, but do not eliminate it which is especially important when using an FPGA device for training an AI solution.  During the training process an AI developer is often faced with the problem of diagnosing and debugging their AI model so that it trains properly on their given dataset.  Solving the ‘model blow-up’ problem can be difficult without being able to actually visually analyze the data flowing from one layer to the next.  In addition, network bottlenecks can be hard to locate when using a large 100+ layer network.

The visual editing and debugging capabilities of the SignalPop AI Designer coupled with the plug-n-play low level software of the MyCaffe AI Platform [8] can help dramatically reduce or eliminate these complexities.

MyCaffe AI Platform Plug-n-Play Architecture

The MyCaffe AI Platform uses a plug-n-play architecture that allows for easy separation of the AI platform (e.g. Solver, Network and Layers) from the low-level primitives that are specific to a given hardware platform (e.g. GPU or FPGA device).

MyCaffe Plug-n-Play Hardware Support

We currently use this architecture to support the various and rapidly changing versions of CUDA and cuDNN produced by NVIDIA.  However, this architecture lends itself well to the future movement from the GPU to FPGA devices discussed above.  And changing between hardware support is quick and easy.  From within the SignalPop AI Designer, the user just selects the hardware DLL from a menu item.  And when programming MyCaffe, the path to the desired hardware DLL is passed to the MyCaffeControl during initialization.

MyCaffe and FPGA

With technologies like Halide to generate much of the low-level software that runs on the FPGA, the MyCaffe plug-n-play architecture is well suited to support a new low-level DLL designed specifically for FPGA devices.

Not only does adding FPGA support expand the reach of the MyCaffe AI Platform to the potential future of AI, it does so for over 6.2 million C# developers world-wide [9].

The SignalPop AI Designer

Combining the plug-n-play design of MyCaffe with the visual editing and debugging features of The SignalPop AI Designer can dramatically reduce the complexity of building AI solutions in general and specifically can make it easier develop such solutions for FPGA devices.

Visual Editing

Typically AI models are developed either using a text-based script that describes the model, or by programmically constructing the model by linking one layer to another.  The SignalPop AI Designer transforms the prototxt model descriptions (used by the original CAFFE [10] open-source platform) into a visual representation of the model that allows for easy parameter changing, one-click help, and live debugging.

Visual Model Editing

Developers can easily switch between the visual editor and text script editor and when the model is saved, the final model descriptor prototxt is produced.

Easy Transfer Learning

After constructing the AI model, developers can easily import weights from other pre-trained models for quick transfer learning.

Easy Transfer Learning

Weights can be imported from models in the ONNX or native CAFFE file formats.

Visual Model Debugging

The SignalPop AI Designer allows visualization of the trained weights, locating bottlenecks, inspecting individual layers and visualizing the data that flows between layers are all key aspects to debugging complex AI models.

Visually Inspecting Weights

By allowing easy weight visualizations, the developer can quickly see if the expected weights are loaded during training.

Identifying Layer Bottlenecks

During training, developers can optionally view the data as it flows through the network in the Debug window and observe the timing of each layers forward and backward pass thus easily showing the designer where the network’s bottlenecks are.

Visually Inspect Embeddings

Right clicking on a debug layer while training allows for easy embedding visualization using the TSNE algorithm.

Visualizing Data Flow

Right clicking on a model layer link while training allows the designer to see the actual data flowing between the layers on both the forward and backward passes.

Summary

The MyCaffe AI Platform gives the AI developer flexibility to target different hardware platforms while the SignalPop AI Designer provides an easy to use visual development environment.

Combining these two offers an AI platform uniquely suited for creating customized solutions and is perfectly positioned for future generations of FPGA AI challenges that may soon supersede the GPU solutions of today.

If you are an FPGA manufacturer searching for an AI software solution that makes AI programming easier, contact us for we would like to work with you!

 


[1] Intel, FPGA vs. GPU for Deep Learning, 2020.

[2] Micron Technology, Micron and Achronix Deliver Next-Generation FPGAs Powered by High Performance GDDR6 Memory for Machine Learning Applications, 2018.

[3] HCL Technologies, Edge Computing using AI Accelerators, 2020.

[4] Ashwin Singh, Hardware for Deep Learning: Known Your Options, Towards Data Science, 2020.

[5] George Leopold, Micron Deep Learning Accelerator Gets Memory Boost, Interprise AI, 2020.

[6] Huimin Li, Xitan Fan, Wei Cao, Xeugong Wei and Lingli Wang A high performance FPGA-based accelerator for large-scale convolutional neural networks, IEEE, 2016.

[7]Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Ou Setter, Jing Pu, Ankita Nayak, Steven E. Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis and Mark Horowitz, Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators, arXiv 1809.04070, 2018, 2020.

[8] Dave Brown MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement Learning, arXiv:1810.02272, 2018.

[9] DAXX How Many Software Developers Are in the US and the World, 2020.

[10] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv:1408.5093v1, 2014.