MyCaffe  1.11.7.7
Deep learning software for Windows C# programmers.
MyCaffe.common.CudaDnn< T > Class Template Reference

The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL. More...

Inheritance diagram for MyCaffe.common.CudaDnn< T >:

Public Member Functions

 CudaDnn (int nDeviceID, DEVINIT flags=(DEVINIT.CUBLAS|DEVINIT.CURAND), long? lSeed=null, string strPath="", bool bResetFirst=false, bool bEnableMemoryTrace=false)
 The CudaDnn constructor. More...
 
 CudaDnn (CudaDnn< T > cuda, bool bEnableGhostMemory)
 Alternate CudaDnn constructor. More...
 
void Dispose ()
 Disposes this instance freeing up all of its host and GPU memory. More...
 
void DisableGhostMemory ()
 Disables the ghost memory, if enabled. More...
 
void ResetGhostMemory ()
 Resets the ghost memory by enabling it if this instance was configured to use ghost memory. More...
 
void KernelCopy (int nCount, long hSrc, int nSrcOffset, long hDstKernel, long hDst, int nDstOffset, long hHostBuffer, long hHostKernel=-1, long hStream=-1, long hSrcKernel=-1)
 Copy memory from the look-up tables in one kernel to another. More...
 
void KernelAdd (int nCount, long hA, long hDstKernel, long hB, long hC)
 Add memory from one kernel to memory residing on another kernel. More...
 
long KernelCopyNccl (long hSrcKernel, long hSrcNccl)
 Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance. More...
 
void SetDeviceID (int nDeviceID=-1, DEVINIT flags=DEVINIT.NONE, long? lSeed=null)
 Set the device ID used by the current instance of CudaDnn. More...
 
void SetRandomSeed (long lSeed)
 Set the random number generator seed. More...
 
int GetDeviceID ()
 Returns the current device id set within Cuda. More...
 
string GetDeviceName (int nDeviceID)
 Query the name of a device. More...
 
string GetDeviceP2PInfo (int nDeviceID)
 Query the peer-to-peer information of a device. More...
 
string GetDeviceInfo (int nDeviceID, bool bVerbose=false)
 Query the device information of a device. More...
 
void ResetDevice ()
 Reset the current device. More...
 
void SynchronizeDevice ()
 Synchronize the operations on the current device. More...
 
int GetMultiGpuBoardGroupID (int nDeviceID)
 Query the mutli-gpu board group id for a device. More...
 
int GetDeviceCount ()
 Query the number of devices (gpu's) installed. More...
 
bool CheckMemoryAttributes (long hSrc, int nSrcDeviceID, long hDst, int nDstDeviceID)
 Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers. More...
 
double GetDeviceMemory (out double dfFree, out double dfUsed, out bool bCudaCallUsed, int nDeviceID=-1)
 Queries the amount of total, free and used memory on a given GPU. More...
 
string GetRequiredCompute (out int nMinMajor, out int nMinMinor)
 The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used. More...
 
bool DeviceCanAccessPeer (int nSrcDeviceID, int nPeerDeviceID)
 Query whether or not two devices can access each other via peer-to-peer memory copies. More...
 
void DeviceEnablePeerAccess (int nPeerDeviceID)
 Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...
 
void DeviceDisablePeerAccess (int nPeerDeviceID)
 Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...
 
long AllocMemory (List< double > rg)
 Allocate a block of GPU memory and copy a list of doubles to it. More...
 
long AllocMemory (List< float > rg)
 Allocate a block of GPU memory and copy a list of floats to it. More...
 
long AllocMemory (double[] rgSrc, long hStream=0)
 Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy. More...
 
long AllocMemory (float[] rgSrc, long hStream=0)
 Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy. More...
 
long AllocMemory (T[] rgSrc, long hStream=0, bool bHalfSize=false)
 Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy. More...
 
long AllocMemory (long lCapacity, bool bHalfSize=false)
 Allocate a block of GPU memory with a specified capacity. More...
 
void FreeMemory (long hMem)
 Free previously allocated GPU memory. More...
 
void CopyDeviceToHost (long lCount, long hGpuSrc, long hHostDst)
 Copy from GPU memory to Host memory. More...
 
void CopyHostToDevice (long lCount, long hHostSrc, long hGpuDst)
 Copy from Host memory to GPU memory. More...
 
long AllocHostBuffer (long lCapacity)
 Allocate a block of host memory with a specified capacity. More...
 
void FreeHostBuffer (long hMem)
 Free previously allocated host memory. More...
 
long GetHostBufferCapacity (long hMem)
 Returns the host memory capacity. More...
 
double[] GetHostMemoryDouble (long hMem)
 Retrieves the host memory as an array of doubles. More...
 
float[] GetHostMemoryFloat (long hMem)
 Retrieves the host memory as an array of floats. More...
 
T[] GetHostMemory (long hMem)
 Retrieves the host memory as an array of type 'T' More...
 
double[] GetMemoryDouble (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of doubles. More...
 
float[] GetMemoryFloat (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of float. More...
 
T[] GetMemory (long hMem, long lCount=-1)
 Retrieves the GPU memory as an array of type 'T' More...
 
void SetMemory (long hMem, List< double > rg)
 Copies a list of doubles into a block of already allocated GPU memory. More...
 
void SetMemory (long hMem, List< float > rg)
 Copies a list of float into a block of already allocated GPU memory. More...
 
void SetMemory (long hMem, double[] rgSrc, long hStream=0)
 Copies an array of double into a block of already allocated GPU memory. More...
 
void SetMemory (long hMem, float[] rgSrc, long hStream=0)
 Copies an array of float into a block of already allocated GPU memory. More...
 
void SetMemory (long hMem, T[] rgSrc, long hStream=0, int nCount=-1)
 Copies an array of type 'T' into a block of already allocated GPU memory. More...
 
void SetMemoryAt (long hMem, double[] rgSrc, int nOffset)
 Copies an array of double into a block of already allocated GPU memory starting at a specific offset. More...
 
void SetMemoryAt (long hMem, float[] rgSrc, int nOffset)
 Copies an array of float into a block of already allocated GPU memory starting at a specific offset. More...
 
void SetMemoryAt (long hMem, T[] rgSrc, int nOffset)
 Copies an array of type 'T' into a block of already allocated GPU memory starting at a specific offset. More...
 
T[] SetPixel (long hMem, int nCount, bool bReturnOriginal, int nOffset, params Tuple< int, T >[] rgPixel)
 Set a pixel value where each pixel is defined a set index, value tuple. More...
 
void SetHostMemory (long hMem, T[] rgSrc)
 Copies an array of type 'T' into a block of already allocated host memory. More...
 
long CreateMemoryPointer (long hData, long lOffset, long lCount)
 Creates a memory pointer into an already existing block of GPU memory. More...
 
void FreeMemoryPointer (long hData)
 Frees a memory pointer. More...
 
long CreateMemoryTest (out ulong ulTotalNumBlocks, out double dfMemAllocatedInGB, out ulong ulMemStartAddr, out ulong ulBlockSize, double dfPctToAllocate=1.0)
 Creates a new memory test on the current GPU. More...
 
void FreeMemoryTest (long h)
 Free a memory test, freeing up all GPU memory used. More...
 
T[] RunMemoryTest (long h, MEMTEST_TYPE type, ulong ulBlockStartOffset, ulong ulBlockCount, bool bVerbose, bool bWrite, bool bReadWrite, bool bRead)
 The RunMemoryTest method runs the memory test from the block start offset through the block count on the memory previously allocated using CreateMemoryTest. More...
 
long CreateImageOp (int nNum, double dfBrightnessProb, double dfBrightnessDelta, double dfContrastProb, double dfContrastLower, double dfContrastUpper, double dfSaturationProb, double dfSaturationLower, double dfSaturationUpper, long lRandomSeed=0)
 Create a new ImageOp used to perform image operations on the GPU. More...
 
void FreeImageOp (long h)
 Free an image op, freeing up all GPU memory used. More...
 
void DistortImage (long h, int nCount, int nNum, int nDim, long hX, long hY)
 Distort an image using the ImageOp handle provided. More...
 
long CreateStream (bool bNonBlocking=false, int nIndex=-1)
 Create a new stream on the current GPU. More...
 
void FreeStream (long h)
 Free a stream. More...
 
void SynchronizeStream (long h=0)
 Synchronize a stream on the current GPU, waiting for its operations to complete. More...
 
void SynchronizeThread ()
 Synchronize all kernel threads on the current GPU. More...
 
long CreateCuDNN (long hStream=0)
 Create a new instance of NVIDIA's cuDnn. More...
 
void FreeCuDNN (long h)
 Free an instance of cuDnn. More...
 
long CreateNCCL (int nDeviceId, int nCount, int nRank, Guid guid)
 Create an instance of NVIDIA's NCCL 'Nickel' More...
 
void FreeNCCL (long hNccl)
 Free an instance of NCCL. More...
 
void NcclInitializeSingleProcess (params long[] rghNccl)
 Initializes a set of NCCL instances for use in a single process. More...
 
void NcclInitializeMultiProcess (long hNccl)
 Initializes a set of NCCL instances for use in different processes. More...
 
void NcclBroadcast (long hNccl, long hStream, long hX, int nCount)
 Broadcasts a block of GPU data to all NCCL instances. More...
 
void NcclAllReduce (long hNccl, long hStream, long hX, int nCount, NCCL_REDUCTION_OP op, double dfScale=1.0)
 Performs a reduction on all NCCL instances as specified by the reduction operation. More...
 
long CreateExtension (string strExtensionDllPath)
 Create an instance of an Extension DLL. More...
 
void FreeExtension (long hExtension)
 Free an instance of an Extension. More...
 
T[] RunExtension (long hExtension, long lfnIdx, T[] rgParam)
 Run a function on the extension specified. More...
 
long CreateTensorDesc ()
 Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn. More...
 
void FreeTensorDesc (long h)
 Free a tensor descriptor instance. More...
 
void SetTensorNdDesc (long hHandle, int[] rgDim, int[] rgStride, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
 
void SetTensorDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
 
void SetTensorDesc (long hHandle, int n, int c, int h, int w, int nStride, int cStride, int hStride, int wStride, bool bHalf=false)
 Sets the values of a tensor descriptor. More...
 
void AddTensor (long hCuDnn, long hSrcDesc, long hSrc, int nSrcOffset, long hDstDesc, long hDst, int nDstOffset)
 Add two tensors together. More...
 
void AddTensor (long hCuDnn, T fAlpha, long hSrcDesc, long hSrc, int nSrcOffset, T fBeta, long hDstDesc, long hDst, int nDstOffset)
 Add two tensors together. More...
 
long CreateFilterDesc ()
 Create a new instance of a filter descriptor for use with NVIDIA's cuDnn. More...
 
void FreeFilterDesc (long h)
 Free a filter descriptor instance. More...
 
void SetFilterNdDesc (long hHandle, int[] rgDim, bool bHalf=false)
 Sets the values of a filter descriptor. More...
 
void SetFilterDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
 Sets the values of a filter descriptor. More...
 
long CreateConvolutionDesc ()
 Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn. More...
 
void FreeConvolutionDesc (long h)
 Free a convolution descriptor instance. More...
 
void SetConvolutionDesc (long hHandle, int hPad, int wPad, int hStride, int wStride, int hDilation, int wDilation, bool bUseTensorCores, bool bHalf=false)
 Set the values of a convolution descriptor. More...
 
void GetConvolutionInfo (long hCuDnn, long hBottomDesc, long hFilterDesc, long hConvDesc, long hTopDesc, ulong lWorkspaceSizeLimitInBytes, bool bUseTensorCores, out CONV_FWD_ALGO algoFwd, out ulong lWsSizeFwd, out CONV_BWD_FILTER_ALGO algoBwdFilter, out ulong lWsSizeBwdFilter, out CONV_BWD_DATA_ALGO algoBwdData, out ulong lWsSizeBwdData, CONV_FWD_ALGO preferredFwdAlgo=CONV_FWD_ALGO.NONE)
 Queryies the algorithms and workspace sizes used for a given convolution descriptor. More...
 
void ConvolutionForward (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
 Perform a convolution forward pass. More...
 
void ConvolutionForward (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
 Perform a convolution forward pass. More...
 
void ConvolutionBackwardBias (long hCuDnn, long hTopDesc, long hTopDiff, int nTopOffset, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the bias. More...
 
void ConvolutionBackwardBias (long hCuDnn, T fAlpha, long hTopDesc, long hTopDiff, int nTopOffset, T fBeta, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the bias. More...
 
void ConvolutionBackwardFilter (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream)
 Perform a convolution backward pass on the filter. More...
 
void ConvolutionBackwardFilter (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the filter. More...
 
void ConvolutionBackwardData (long hCuDnn, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the data. More...
 
void ConvolutionBackwardData (long hCuDnn, T fAlpha, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
 Perform a convolution backward pass on the data. More...
 
long CreatePoolingDesc ()
 Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn. More...
 
void FreePoolingDesc (long h)
 Free a pooling descriptor instance. More...
 
void SetPoolingDesc (long hHandle, PoolingMethod method, int h, int w, int hPad, int wPad, int hStride, int wStride)
 Set the values of a pooling descriptor. More...
 
void PoolingForward (long hCuDnn, long hPoolingDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
 Perform a pooling forward pass. More...
 
void PoolingBackward (long hCuDnn, long hPoolingDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a pooling backward pass. More...
 
void DeriveBatchNormDesc (long hFwdScaleBiasMeanVarDesc, long hFwdBottomDesc, long hBwdScaleBiasMeanVarDesc, long hBwdBottomDesc, BATCHNORM_MODE mode)
 Derive the batch norm descriptors for both the forward and backward passes. More...
 
void BatchNormForward (long hCuDnn, BATCHNORM_MODE mode, T fAlpha, T fBeta, long hFwdBottomDesc, long hBottomData, long hFwdTopDesc, long hTopData, long hFwdScaleBiasMeanVarDesc, long hScaleData, long hBiasData, double dfFactor, long hGlobalMean, long hGlobalVar, double dfEps, long hSaveMean, long hSaveInvVar, bool bTraining)
 Run the batch norm forward pass. More...
 
void BatchNormBackward (long hCuDnn, BATCHNORM_MODE mode, T fAlphaDiff, T fBetaDiff, T fAlphaParamDiff, T fBetaParamDiff, long hBwdBottomDesc, long hBottomData, long hTopDiffDesc, long hTopDiff, long hBottomDiffDesc, long hBottomDiff, long hBwdScaleBiasMeanVarDesc, long hScaleData, long hScaleDiff, long hBiasDiff, double dfEps, long hSaveMean, long hSaveInvVar)
 Run the batch norm backward pass. More...
 
long CreateDropoutDesc ()
 Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn. More...
 
void FreeDropoutDesc (long h)
 Free a dropout descriptor instance. More...
 
void SetDropoutDesc (long hCuDnn, long hDropoutDesc, double dfDropout, long hStates, long lSeed)
 Set the dropout descriptor values. More...
 
void GetDropoutInfo (long hCuDnn, long hBottomDesc, out ulong ulStateCount, out ulong ulReservedCount)
 Query the dropout state and reserved counts. More...
 
void DropoutForward (long hCuDnn, long hDropoutDesc, long hBottomDesc, long hBottomData, long hTopDesc, long hTopData, long hReserved)
 Performs a dropout forward pass. More...
 
void DropoutBackward (long hCuDnn, long hDropoutDesc, long hTopDesc, long hTop, long hBottomDesc, long hBottom, long hReserved)
 Performs a dropout backward pass. More...
 
long CreateLRNDesc ()
 Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn. More...
 
void FreeLRNDesc (long h)
 Free a LRN descriptor instance. More...
 
void SetLRNDesc (long hHandle, uint nSize, double fAlpha, double fBeta, double fK)
 Set the LRN descriptor values. More...
 
void LRNCrossChannelForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
 Perform LRN cross channel forward pass. More...
 
void LRNCrossChannelBackward (long hCuDnn, long hNormDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform LRN cross channel backward pass. More...
 
void DivisiveNormalizationForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTemp1, long hTemp2, T fBeta, long hTopDataDesc, long hTopData)
 Performs a Devisive Normalization forward pass. More...
 
void DivisiveNormalizationBackward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTopDiff, long hTemp1, long hTemp2, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Performs a Devisive Normalization backward pass. More...
 
void TanhForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Tanh forward pass. More...
 
void TanhBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Tanh backward pass. More...
 
void EluForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Elu forward pass. More...
 
void EluBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Elu backward pass. More...
 
void SigmoidForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Sigmoid forward pass. More...
 
void SigmoidBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Sigmoid backward pass. More...
 
void ReLUForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a ReLU forward pass. More...
 
void ReLUBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a ReLU backward pass. More...
 
void SoftmaxForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
 Perform a Softmax forward pass. More...
 
void SoftmaxBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, T fBeta, long hBottomDiffDesc, long hBottomDiff)
 Perform a Softmax backward pass. More...
 
long CreateRnnDataDesc ()
 Create the RNN Data Descriptor. More...
 
void FreeRnnDataDesc (long h)
 Free an existing RNN Data descriptor. More...
 
void SetRnnDataDesc (long hRnnDataDesc, RNN_DATALAYOUT layout, int nMaxSeqLen, int nBatchSize, int nVectorSize, bool bBidirectional=false, int[] rgSeqLen=null)
 Sets the RNN Data Descriptor values. More...
 
long CreateRnnDesc ()
 Create the RNN Descriptor. More...
 
void FreeRnnDesc (long h)
 Free an existing RNN descriptor. More...
 
void SetRnnDesc (long hCuDnn, long hRnnDesc, int nHiddenCount, int nNumLayers, long hDropoutDesc, RNN_MODE mode, bool bUseTensorCores, RNN_DIRECTION direction=RNN_DIRECTION.RNN_UNIDIRECTIONAL)
 Sets the RNN Descriptor values. More...
 
int GetRnnParamCount (long hCuDnn, long hRnnDesc, long hXDesc)
 Returns the RNN parameter count. More...
 
ulong GetRnnWorkspaceCount (long hCuDnn, long hRnnDesc, long hXDesc, out ulong nReservedCount)
 Returns the workspace and reserved counts. More...
 
void GetRnnLinLayerParams (long hCuDnn, long hRnnDesc, int nLayer, long hXDesc, long hWtDesc, long hWtData, int nLinLayer, out int nWtCount, out long hWt, out int nBiasCount, out long hBias)
 Returns the linear layer parameters (weights). More...
 
void RnnForward (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hWtDesc, long hWtData, long hYDesc, long hYData, long hHyDesc, long hHyData, long hCyDesc, long hCyData, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount, bool bTraining)
 Run the RNN through a forward pass. More...
 
void RnnBackwardData (long hCuDnn, long hRnnDesc, long hYDesc, long hYData, long hYDiff, long hHyDesc, long hHyDiff, long hCyDesc, long hCyDiff, long hWtDesc, long hWtData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hXDesc, long hXDiff, long hdHxDesc, long hHxDiff, long hdCxDesc, long hCxDiff, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount)
 Run the RNN backward pass through the data. More...
 
void RnnBackwardWeights (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hYDesc, long hYData, long hWorkspace, ulong nWsCount, long hWtDesc, long hWtDiff, long hReserved, ulong nResCount)
 Run the RNN backward pass on the weights. More...
 
long AllocPCAData (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA Data. More...
 
long AllocPCAScores (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA scores. More...
 
long AllocPCALoads (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA loads. More...
 
long AllocPCAEigenvalues (int nM, int nN, int nK, out int nCount)
 Allocates the GPU memory for the PCA eigenvalues. More...
 
long CreatePCA (int nMaxIterations, int nM, int nN, int nK, long hData, long hScoresResult, long hLoadsResult, long hResiduals=0, long hEigenvalues=0)
 Creates a new PCA instance and returns the handle to it. More...
 
bool RunPCA (long hPCA, int nSteps, out int nCurrentK, out int nCurrentIteration)
 Runs a number of steps of the iterative PCA algorithm. More...
 
void FreePCA (long hPCA)
 Free the PCA instance associated with handle. More...
 
long CreateSSD (int nNumClasses, bool bShareLocation, int nLocClasses, int nBackgroundLabelId, bool bUseDiffcultGt, SSD_MINING_TYPE miningType, SSD_MATCH_TYPE matchType, float fOverlapThreshold, bool bUsePriorForMatching, SSD_CODE_TYPE codeType, bool bEncodeVariantInTgt, bool bBpInside, bool bIgnoreCrossBoundaryBbox, bool bUsePriorForNms, SSD_CONF_LOSS_TYPE confLossType, SSD_LOC_LOSS_TYPE locLossType, float fNegPosRatio, float fNegOverlap, int nSampleSize, bool bMapObjectToAgnostic, bool bNmsParam, float? fNmsThreshold=null, int? nNmsTopK=null, float? fNmsEta=null)
 Create an instance of the SSD GPU support. More...
 
void SetupSSD (long hSSD, int nNum, int nNumPriors, int nNumGt)
 Setup the SSD GPU support. More...
 
void FreeSSD (long hSSD)
 Free the instance of SSD GPU support. More...
 
int SsdMultiBoxLossForward (long hSSD, int nLocDataCount, long hLocGpuData, int nConfDataCount, long hConfGpuData, int nPriorDataCount, long hPriorGpuData, int nGtDataCount, long hGtGpuData, out List< DictionaryMap< List< int > > > rgAllMatchIndices, out List< List< int > > rgrgAllNegIndices, out int nNumNegs)
 Performs the SSD MultiBoxLoss forward operation. More...
 
void SsdEncodeLocPrediction (long hSSD, int nLocPredCount, long hLocPred, int nLocGtCount, long hLocGt)
 Encodes the SSD data into the location prediction and location ground truths. More...
 
void SsdEncodeConfPrediction (long hSSD, int nConfPredCount, long hConfPred, int nConfGtCount, long hConfGt)
 Encodes the SSD data into the confidence prediction and confidence ground truths. More...
 
void set (int nCount, long hHandle, double fVal, int nIdx=-1)
 Set the values of GPU memory to a specified value of type More...
 
void set (int nCount, long hHandle, float fVal, int nIdx=-1)
 Set the values of GPU memory to a specified value of type More...
 
void set (int nCount, long hHandle, T fVal, int nIdx=-1, int nXOff=0)
 Set the values of GPU memory to a specified value of type 'T'. More...
 
double[] get_double (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of More...
 
float[] get_float (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of More...
 
T[] get (int nCount, long hHandle, int nIdx=-1)
 Queries the GPU memory by copying it into an array of type 'T'. More...
 
void copy (int nCount, long hSrc, long hDst, int nSrcOffset=0, int nDstOffset=0, long hStream=-1, bool? bSrcHalfSizeOverride=null, bool? bDstHalfSizeOverride=null)
 Copy data from one block of GPU memory to another. More...
 
void copy (int nCount, int nNum, int nDim, long hSrc1, long hSrc2, long hDst, long hSimilar, bool bInvert=false)
 Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0). More...
 
void copy_batch (int nCount, int nNum, int nDim, long hSrcData, long hSrcLbl, int nDstCount, long hDstCache, long hWorkDevData, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, long hWorkDataHost)
 Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data. More...
 
void copy_sequence (int nK, int nNum, int nDim, long hSrcData, long hSrcLbl, int nSrcCacheCount, long hSrcCache, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, bool bOutputLabels, List< long > rghTop, List< int > rgnTopCount, long hWorkDataHost, bool bCombinePositiveAndNegative=false, int nSeed=0)
 Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs. More...
 
void copy_sequence (int n, long hSrc, int nSrcStep, int nSrcStartIdx, int nCopyCount, int nCopyDim, long hDst, int nDstStep, int nDstStartIdx, int nSrcSpatialDim, int nDstSpatialDim, int nSrcSpatialDimStartIdx=0, int nDstSpatialDimStartIdx=0, int nSpatialDimCount=-1)
 Copy a sequence from a source to a destination and allow for skip steps. More...
 
void copy_expand (int n, int nNum, int nDim, long hX, long hA)
 Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row. More...
 
void fill (int n, int nDim, long hSrc, int nSrcOff, int nCount, long hDst)
 Fill data from the source data 'n' times in the destination. More...
 
void sort (int nCount, long hY)
 Sort the data in the GPU memory specified. More...
 
void gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
 
void gemm (bool bTransA, bool bTransB, int m, int n, int k, float fAlpha, long hA, long hB, float fBeta, long hC)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
 
void gemm (bool bTransA, bool bTransB, int m, int n, int k, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0, int nGroups=1, int nGroupOffsetA=0, int nGroupOffsetB=0, int nGroupOffsetC=0)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
 
void gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC, uint lda, uint ldb, uint ldc)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
 
void geam (bool bTransA, bool bTransB, int m, int n, double fAlpha, long hA, long hB, double fBeta, long hC)
 Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...
 
void geam (bool bTransA, bool bTransB, int m, int n, float fAlpha, long hA, long hB, float fBeta, long hC)
 Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...
 
void geam (bool bTransA, bool bTransB, int m, int n, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0)
 Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...
 
void gemv (bool bTransA, int m, int n, double fAlpha, long hA, long hX, double fBeta, long hY)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
 
void gemv (bool bTransA, int m, int n, float fAlpha, long hA, long hX, float fBeta, long hY)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
 
void gemv (bool bTransA, int m, int n, T fAlpha, long hA, long hX, T fBeta, long hY, int nAOffset=0, int nXOffset=0, int nYOffset=0)
 Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...
 
void ger (int m, int n, double fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
 
void ger (int m, int n, float fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
 
void ger (int m, int n, T fAlpha, long hX, long hY, long hA)
 Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...
 
void axpy (int n, double fAlpha, long hX, long hY)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
 
void axpy (int n, float fAlpha, long hX, long hY)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
 
void axpy (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
 Multiply the vector X by a scalar and add the result to the vector Y. More...
 
void axpby (int n, double fAlpha, long hX, double fBeta, long hY)
 Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...
 
void axpby (int n, float fAlpha, long hX, float fBeta, long hY)
 Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...
 
void axpby (int n, T fAlpha, long hX, T fBeta, long hY)
 Scale the vector x by Alpha and scale vector y by Beta and then add both together. More...
 
void mulbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
 Multiply a matrix with a vector. More...
 
void divbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
 Divide a matrix by a vector. More...
 
void set_bounds (int n, double dfMin, double dfMax, long hX)
 Set the bounds of all items within the data to a set range of values. More...
 
void scal (int n, double fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
 
void scal (int n, float fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
 
void scal (int n, T fAlpha, long hX, int nXOff=0)
 Scales the data in X by a scaling factor. More...
 
double dot_double (int n, long hX, long hY)
 Computes the dot product of X and Y. More...
 
float dot_float (int n, long hX, long hY)
 Computes the dot product of X and Y. More...
 
dot (int n, long hX, long hY, int nXOff=0, int nYOff=0)
 Computes the dot product of X and Y. More...
 
double asum_double (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
 
float asum_float (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
 
asum (int n, long hX, int nXOff=0)
 Computes the sum of absolute values in X. More...
 
void scale (int n, double fAlpha, long hX, long hY)
 Scales the values in X and places them in Y. More...
 
void scale (int n, float fAlpha, long hX, long hY)
 Scales the values in X and places them in Y. More...
 
void scale (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
 Scales the values in X and places them in Y. More...
 
void scale_to_range (int n, long hX, long hY, double fMin, double fMax)
 Scales the values in X and places the result in Y (can also run inline where X = Y). More...
 
double erf (double dfVal)
 Calculates the erf() function. More...
 
float erf (float fVal)
 Calculates the erf() function. More...
 
erf (T fVal)
 Calculates the erf() function. More...
 
void interp2 (int nChannels, long hData1, int nX1, int nY1, int nHeight1, int nWidth1, int nHeight1A, int nWidth1A, long hData2, int nX2, int nY2, int nHeight2, int nWidth2, int nHeight2A, int nWidth2A, bool bBwd=false)
 Interpolates between two sizes within the spatial dimensions. More...
 
void add_scalar (int n, double fAlpha, long hY)
 Adds a scalar value to each element of Y. More...
 
void add_scalar (int n, float fAlpha, long hY)
 Adds a scalar value to each element of Y. More...
 
void add_scalar (int n, T fAlpha, long hY, int nYOff=0)
 Adds a scalar value to each element of Y. More...
 
void add (int n, long hA, long hB, long hY)
 Adds A to B and places the result in Y. More...
 
void add (int n, long hA, long hB, long hY, double dfAlpha)
 Adds A to (B times scalar) and places the result in Y. More...
 
void add (int n, long hA, long hB, long hY, float fAlpha)
 Adds A to (B times scalar) and places the result in Y. More...
 
void add (int n, long hA, long hB, long hY, double dfAlphaA, double dfAlphaB, int nAOff=0, int nBOff=0, int nYOff=0)
 Adds A to (B times scalar) and places the result in Y. More...
 
void sub (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0, int nB=0)
 Subtracts B from A and places the result in Y. More...
 
void mul (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0)
 Multiplies each element of A with each element of B and places the result in Y. More...
 
void sub_and_dot (int n, int nN, int nInnerNum, long hA, long hB, long hY, int nAOff, int nBOff, int nYOff)
 Subtracts every nInnterNum element of B from A and performs a dot product on the result. More...
 
void mul_scalar (int n, double fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
 
void mul_scalar (int n, float fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
 
void mul_scalar (int n, T fAlpha, long hY)
 Mutlipy each element of Y by a scalar. More...
 
void div (int n, long hA, long hB, long hY)
 Divides each element of A by each element of B and places the result in Y. More...
 
void abs (int n, long hA, long hY)
 Calculates the absolute value of A and places the result in Y. More...
 
void exp (int n, long hA, long hY)
 Calculates the exponent value of A and places the result in Y. More...
 
void exp (int n, long hA, long hY, int nAOff, int nYOff, double dfBeta)
 Calculates the exponent value of A * beta and places the result in Y. More...
 
void log (int n, long hA, long hY)
 Calculates the log value of A and places the result in Y. More...
 
void log (int n, long hA, long hY, double dfBeta, double dfAlpha=0)
 Calculates the log value of (A * beta) + alpha, and places the result in Y. More...
 
void powx (int n, long hA, double fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
 
void powx (int n, long hA, float fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
 
void powx (int n, long hA, T fAlpha, long hY, int nAOff=0, int nYOff=0)
 Calculates the A raised to the power alpha and places the result in Y. More...
 
void sign (int n, long hX, long hY, int nXOff=0, int nYOff=0)
 Computes the sign of each element of X and places the result in Y. More...
 
void sqrt (int n, long hX, long hY)
 Computes the square root of each element of X and places the result in Y. More...
 
void sqrt_scale (int nCount, long hX, long hY)
 Scale the data by the sqrt of the data. y = sqrt(abs(x)) * sign(x) More...
 
void compare_signs (int n, long hA, long hB, long hY)
 Compares the signs of each value in A and B and places the result in Y. More...
 
double max (int n, long hA, out long lPos, int nAOff=0)
 Finds the maximum value of A. More...
 
double min (int n, long hA, out long lPos, int nAOff=0)
 Finds the minimum value of A. More...
 
Tuple< double, double, double, double > minmax (int n, long hA, long hWork1, long hWork2, bool bDetectNans=false, int nAOff=0)
 Finds the minimum and maximum values within A. More...
 
void minmax (int n, long hA, long hWork1, long hWork2, int nK, long hMin, long hMax, bool bNonZeroOnly)
 Finds up to 'nK' minimum and maximum values within A. More...
 
void transpose (int n, long hX, long hY, long hXCounts, long hYCounts, long hMapping, int nNumAxes, long hBuffer)
 Perform a transpose on X producing Y, similar to the numpy.transpose operation. More...
 
double sumsq (int n, long hW, long hA, int nAOff=0)
 Calculates the sum of squares of A. More...
 
double sumsqdiff (int n, long hW, long hA, long hB, int nAOff=0, int nBOff=0)
 Calculates the sum of squares of differences between A and B More...
 
void width (int n, long hMean, long hMin, long hMax, double dfAlpha, long hWidth)
 Calculates the width values. More...
 
bool contains_point (int n, long hMean, long hWidth, long hX, long hWork, int nXOff=0)
 Returns true if the point is contained within the bounds. More...
 
void denan (int n, long hX, double dfReplacement)
 Replaces all NAN values witin X with a replacement value. More...
 
void im2col (long hDataIm, int nDataImOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataCol, int nDataColOffset)
 Rearranges image blocks into columns. More...
 
void im2col_nd (long hDataIm, int nDataImOffset, int nNumSpatialAxes, int nImCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataCol, int nDataColOffset)
 Rearranges image blocks into columns. More...
 
void col2im (long hDataCol, int nDataColOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataIm, int nDataImOffset)
 Rearranges the columns into image blocks. More...
 
void col2im_nd (long hDataCol, int nDataColOffset, int nNumSpatialAxes, int nColCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataIm, int nDataImOffset)
 Rearranges the columns into image blocks. More...
 
void channel_min (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Calculates the minimum value within each channel of X and places the result in Y. More...
 
void channel_max (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Calculates the maximum value within each channel of X and places the result in Y. More...
 
void channel_compare (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set. More...
 
void channel_fill (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, int nLabelDim, long hLabels, long hY)
 Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels. More...
 
void channel_sub (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Subtracts the values across the channels from X and places the result in Y. More...
 
void channel_sum (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
 Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y. More...
 
void channel_div (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
 Divides the values of the channels from X and places the result in Y. More...
 
void channel_mul (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
 Multiplies the values of the channels from X and places the result in Y. More...
 
void channel_mulv (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hA, long hX, long hC)
 Multiplies the values in vector X by each channel in matrix A and places the result in matrix C. More...
 
void channel_scale (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
 Multiplies the values of the channels from X with the scalar values in B and places the result in Y. More...
 
void channel_dot (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
 Calculates the dot product the the values within each channel of X and places the result in Y. More...
 
void sum (int nCount, int nOuterNum, int nInnerNum, long hX, long hY)
 Calculates the sum of inner values of X and places the result in Y. More...
 
void rng_setseed (long lSeed)
 Sets the random number generator seed used by random number operations. More...
 
void rng_uniform (int n, double fMin, double fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
 
void rng_uniform (int n, float fMin, float fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
 
void rng_uniform (int n, T fMin, T fMax, long hY)
 Fill Y with random numbers using a uniform random distribution. More...
 
void rng_gaussian (int n, double fMu, double fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
 
void rng_gaussian (int n, float fMu, float fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
 
void rng_gaussian (int n, T fMu, T fSigma, long hY)
 Fill Y with random numbers using a gaussian random distribution. More...
 
void rng_bernoulli (int n, double fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
 
void rng_bernoulli (int n, float fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
 
void rng_bernoulli (int n, T fNonZeroProb, long hY)
 Fill Y with random numbers using a bernoulli random distribution. More...
 
void accuracy_fwd (int nCount, long hBottomData, long hBottomLabel, long hAccData, int nOuterNum, int nDim, int nInnerNum, int nNumLabels, int nTopK, long hCounts, bool bPerClass, int? nIgnoreLabel=null)
 Performs the forward pass for the accuracy layer More...
 
void batchreidx_fwd (int nCount, int nInnerDim, long hBottomData, long hPermutData, long hTopData)
 Performs the forward pass for batch re-index More...
 
void batchreidx_bwd (int nCount, int nInnerDim, long hTopDiff, long hTopIdx, long hBegins, long hCounts, long hBottomDiff)
 Performs the backward pass for batch re-index More...
 
void embed_fwd (int nCount, long hBottomData, long hWeight, int nM, int nN, int nK, long hTopData)
 Performs the forward pass for embed More...
 
void embed_bwd (int nCount, long hBottomData, long hTopDiff, int nM, int nN, int nK, long hWeightDiff)
 Performs the backward pass for embed More...
 
void pooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask, long hTopMask)
 Performs the forward pass for pooling using Cuda More...
 
void pooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask, long hTopMask)
 Performs the backward pass for pooling using Cuda More...
 
void unpooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask)
 Performs the forward pass for unpooling using Cuda More...
 
void unpooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask)
 Performs the backward pass for unpooling using Cuda More...
 
void clip_fwd (int nCount, long hBottomData, long hTopData, T fMin, T fMax)
 Performs a Clip forward pass in Cuda. More...
 
void clip_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff, T fMin, T fMax)
 Performs a Clip backward pass in Cuda. More...
 
void math_fwd (int nCount, long hBottomData, long hTopData, MATH_FUNCTION function)
 Performs a Math function forward pass in Cuda. More...
 
void math_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, MATH_FUNCTION function)
 Performs a Math function backward pass in Cuda. More...
 
void mean_error_loss_bwd (int nCount, long hPredicted, long hTarget, long hBottomDiff, MEAN_ERROR merr)
 Performs a Mean Error Loss backward pass in Cuda. More...
 
void mish_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
 Performs a Mish forward pass in Cuda. More...
 
void mish_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold, int nMethod=0)
 Performs a Mish backward pass in Cuda. More...
 
void serf_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
 Performs a Serf forward pass in Cuda. More...
 
void serf_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold)
 Performs a Serf backward pass in Cuda. More...
 
void tanh_fwd (int nCount, long hBottomData, long hTopData)
 Performs a TanH forward pass in Cuda. More...
 
void tanh_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
 Performs a TanH backward pass in Cuda. More...
 
void sigmoid_fwd (int nCount, long hBottomData, long hTopData)
 Performs a Sigmoid forward pass in Cuda. More...
 
void sigmoid_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
 Performs a Sigmoid backward pass in Cuda. More...
 
void swish_bwd (int nCount, long hTopDiff, long hTopData, long hSigmoidOutputData, long hBottomDiff, double dfBeta)
 Performs a Swish backward pass in Cuda. More...
 
void relu_fwd (int nCount, long hBottomData, long hTopData, T fNegativeSlope)
 Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda. More...
 
void relu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, T fNegativeSlope)
 Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda. More...
 
void elu_fwd (int nCount, long hBottomData, long hTopData, double dfAlpha)
 Performs a Exponential Linear Unit (ELU) forward pass in Cuda. More...
 
void elu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomData, long hBottomDiff, double dfAlpha)
 Performs a Exponential Linear Unit (ELU) backward pass in Cuda. More...
 
void dropout_fwd (int nCount, long hBottomData, long hMask, uint uiThreshold, T fScale, long hTopData)
 Performs a dropout forward pass in Cuda. More...
 
void dropout_bwd (int nCount, long hTopDiff, long hMask, uint uiThreshold, T fScale, long hBottomDiff)
 Performs a dropout backward pass in Cuda. More...
 
void bnll_fwd (int nCount, long hBottomData, long hTopData)
 Performs a binomial normal log liklihod (BNLL) forward pass in Cuda. More...
 
void bnll_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff)
 Performs a binomial normal log liklihod (BNLL) backward pass in Cuda. More...
 
void prelu_fwd (int nCount, int nChannels, int nDim, long hBottomData, long hTopData, long hSlopeData, int nDivFactor)
 Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda. More...
 
void prelu_bwd_param (int nCDim, int nNum, int nTopOffset, long hTopDiff, long hBottomData, long hBackBuffDiff)
 Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda. More...
 
void prelu_bwd (int nCount, int nChannels, int nDim, long hTopDiff, long hBottomData, long hBottomDiff, long hSlopeData, int nDivFactor)
 Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda. More...
 
void softmaxloss_fwd (int nCount, long hProbData, long hLabel, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs Softmax Loss forward pass in Cuda. More...
 
void softmaxloss_bwd (int nCount, long hTopData, long hLabel, long hBottomDiff, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
 Performs Softmax Loss backward pass in Cuda. More...
 
void max_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
 Performs a max forward pass in Cuda. More...
 
void max_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
 Performs a max backward pass in Cuda. More...
 
void min_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
 Performs a min forward pass in Cuda. More...
 
void min_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
 Performs a min backward pass in Cuda. More...
 
void crop_fwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomData, long hTopData)
 Performs the crop forward operation. More...
 
void crop_bwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomDiff, long hTopDiff)
 Performs the crop backward operation. More...
 
void concat_fwd (int nCount, long hBottomData, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hTopData)
 Performs a concat forward pass in Cuda. More...
 
void concat_bwd (int nCount, long hTopDiff, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hBottomDiff)
 Performs a concat backward pass in Cuda. More...
 
void slice_fwd (int nCount, long hBottomData, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hTopData)
 Performs a slice forward pass in Cuda. More...
 
void slice_bwd (int nCount, long hTopDiff, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hBottomDiff)
 Performs a slice backward pass in Cuda. More...
 
void tile_fwd (int nCount, long hBottomData, int nInnerDim, int nTiles, int nBottomTileAxis, long hTopData)
 Performs a tile forward pass in Cuda. More...
 
void tile_bwd (int nCount, long hTopDiff, int nTileSize, int nTiles, int nBottomTileAxis, long hBottomDiff)
 Performs a tile backward pass in Cuda. More...
 
void bias_fwd (int nCount, long hBottomData, long hBiasData, int nBiasDim, int nInnerDim, long hTopData)
 Performs a bias forward pass in Cuda. More...
 
void scale_fwd (int nCount, long hX, long hScaleData, int nScaleDim, int nInnerDim, long hY, long hBiasData=0)
 Performs a scale forward pass in Cuda. More...
 
void threshold_fwd (int nCount, double dfThreshold, long hX, long hY)
 Performs a threshold pass in Cuda. More...
 
void cll_bwd (int nCount, int nChannels, double dfMargin, bool bLegacyVersion, double dfAlpha, long hY, long hDiff, long hDistSq, long hBottomDiff)
 Performs a contrastive loss layer backward pass in Cuda. More...
 
void smoothl1_fwd (int nCount, long hX, long hY)
 Performs the forward operation for the SmoothL1 loss. More...
 
void smoothl1_bwd (int nCount, long hX, long hY)
 Performs the backward operation for the SmoothL1 loss. More...
 
void permute (int nCount, long hBottom, bool bFwd, long hPermuteOrder, long hOldSteps, long hNewSteps, int nNumAxes, long hTop)
 Performs data permutation on the input and reorders the data which is placed in the output. More...
 
void gather_fwd (int nCount, long hBottom, long hTop, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
 Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data. More...
 
void gather_bwd (int nCount, long hTop, long hBottom, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
 Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data. More...
 
void lrn_fillscale (int nCount, long hBottomData, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fAlphaOverSize, T fK, long hScaleData)
 Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda. More...
 
void lrn_computeoutput (int nCount, long hBottomData, long hScaleData, T fNegativeBeta, long hTopData)
 Computes the output used to calculate the LRN cross channel forward pass in Cuda. More...
 
void lrn_computediff (int nCount, long hBottomData, long hTopData, long hScaleData, long hTopDiff, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fNegativeBeta, T fCacheRatio, long hBottomDiff)
 Computes the diff used to calculate the LRN cross channel backward pass in Cuda. More...
 
void sgd_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
 Perform the Stochastic Gradient Descent (SGD) update More...
 
void nesterov_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
 Perform the Nesterov update More...
 
void adagrad_update (int nCount, long hNetParamsDiff, long hHistoryData, T fDelta, T fLocalRate)
 Perform the AdaGrad update More...
 
void adadelta_update (int nCount, long hNetParamsDiff, long hHistoryData1, long hHistoryData2, T fMomentum, T fDelta, T fLocalRate)
 Perform the AdaDelta update More...
 
void adam_update (int nCount, long hNetParamsDiff, long hValM, long hValV, T fBeta1, T fBeta2, T fEpsHat, T fCorrectedLocalRate)
 Perform the Adam update More...
 
void rmsprop_update (int nCount, long hNetParamsDiff, long hHistoryData, T fRmsDecay, T fDelta, T fLocalRate)
 Perform the RMSProp update More...
 
void lstm_fwd (int t, int nN, int nH, int nI, long hWeight_h, long hWeight_i, long hClipData, int nClipOffset, long hTopData, int nTopOffset, long hCellData, int nCellOffset, long hPreGateData, int nPreGateOffset, long hGateData, int nGateOffset, long hHT1Data, int nHT1Offset, long hCT1Data, int nCT1Offset, long hHtoGateData, long hContext=0, long hWeight_c=0, long hCtoGetData=0)
 Peforms the simple LSTM foward pass in Cuda. More...
 
void lstm_bwd (int t, int nN, int nH, int nI, double dfClippingThreshold, long hWeight_h, long hClipData, int nClipOffset, long hTopDiff, int nTopOffset, long hCellData, long hCellDiff, int nCellOffset, long hPreGateDiff, int nPreGateOffset, long hGateData, long hGateDiff, int nGateOffset, long hCT1Data, int nCT1Offset, long hDHT1Diff, int nDHT1Offset, long hDCT1Diff, int nDCT1Offset, long hHtoHData, long hContextDiff=0, long hWeight_c=0)
 Peforms the simple LSTM backward pass in Cuda. More...
 
void lstm_unit_fwd (int nCount, int nHiddenDim, int nXCount, long hX, long hX_acts, long hC_prev, long hCont, long hC, long hH)
 Peforms the simple LSTM foward pass in Cuda for a given LSTM unit. More...
 
void lstm_unit_bwd (int nCount, int nHiddenDim, int nXCount, long hC_prev, long hX_acts, long hC, long hH, long hCont, long hC_diff, long hH_diff, long hC_prev_diff, long hX_acts_diff, long hX_diff)
 Peforms the simple LSTM backward pass in Cuda for a given LSTM unit. More...
 
void coeff_sum_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
 Performs a coefficient sum foward pass in Cuda. More...
 
void coeff_sum_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
 Performs a coefficient sum backward pass in Cuda. More...
 
void coeff_sub_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
 Performs a coefficient sub foward pass in Cuda. More...
 
void coeff_sub_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
 Performs a coefficient sub backward pass in Cuda. More...
 
void cross_entropy_fwd (int nCount, long hInput, long hTarget, long hLoss, bool bHasIgnoreLabel, int nIgnoreLabel, long hCountData)
 Performs a sigmoid cross entropy forward pass in Cuda. More...
 
void cross_entropy_ignore (int nCount, int nIgnoreLabel, long hTarget, long hBottomDiff)
 Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified. More...
 
void debug ()
 The debug function is uses only during debugging the debug version of the low-level DLL. More...
 
void matrix_meancenter_by_column (int nWidth, int nHeight, long hA, long hB, long hY, bool bNormalize=false)
 Mean center the data by columns, where each column is summed and then subtracted from each column value. More...
 
void gaussian_blur (int n, int nChannels, int nHeight, int nWidth, double dfSigma, long hX, long hY)
 The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma. More...
 
double hamming_distance (int n, double dfThreshold, long hA, long hB, long hY, int nOffA=0, int nOffB=0, int nOffY=0)
 The hamming_distance calculates the Hamming Distance between X and Y both of length n. More...
 
void calc_dft_coefficients (int n, long hX, int m, long hY)
 Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY. More...
 
double[] calculate_batch_distances (DistanceMethod distMethod, double dfThreshold, int nItemDim, long hSrc, long hTargets, long hWork, int[,] rgOffsets)
 The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified. More...
 

Static Public Member Functions

static string GetCudaDnnDllPath ()
 Returns the path to the CudaDnnDll module to use for low level CUDA processing. More...
 
static void SetDefaultCudaPath (string strPath)
 Used to optionally set the default path to the Low-Level Cuda Dnn DLL file. More...
 
static ulong basetype_size (bool bUseHalfSize)
 Returns the base type size in bytes. More...
 

Protected Member Functions

virtual void Dispose (bool bDisposing)
 Disposes this instance freeing up all of its host and GPU memory. More...
 

Properties

ulong TotalMemoryUsed [get]
 Returns the total amount of GPU memory used by this instance. More...
 
string TotalMemoryUsedAsText [get]
 Returns the total amount of memory used. More...
 
long KernelHandle [get]
 Returns the Low-Level kernel handle used for this instance. Each Low-Level kernel maintains its own set of look-up tables for memory, streams, cuDnn constructs, etc. More...
 
string Path [get]
 Specifies the file path used to load the Low-Level Cuda DNN Dll file. More...
 
static string DefaultPath [get]
 Specifies the default path used t load the Low-Level Cuda DNN Dll file. More...
 
int OriginalDeviceID [get]
 Returns the original device ID used to create the instance of CudaDnn. More...
 

Detailed Description

The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL.

This is the transition location where C# meets C++.

Template Parameters
TSpecifies the base type float or double. Using float is recommended to conserve GPU memory.

Definition at line 828 of file CudaDnn.cs.

Constructor & Destructor Documentation

◆ CudaDnn() [1/2]

MyCaffe.common.CudaDnn< T >.CudaDnn ( int  nDeviceID,
DEVINIT  flags = (DEVINIT.CUBLAS | DEVINIT.CURAND),
long?  lSeed = null,
string  strPath = "",
bool  bResetFirst = false,
bool  bEnableMemoryTrace = false 
)

The CudaDnn constructor.

Parameters
nDeviceIDSpecifies the zero-based device (GPU) id. Note, if there are 5 GPU's in the system, the device ID's will be numbered 0, 1, 2, 3, 4.
flagsSpecifies the flags under which to initialize the Low-Level Cuda system.
lSeedOptionally specifies the random number generator seed. Typically this is only used during testing.
strPathSpecifies the file path of the Low-Level Cuda DNN Dll file. When NULL or empty, the Low-Level
CudaDNNDll.dll
file in the directory of the currently executing process (that is using the CudaDnn object) is used.
bResetFirstSpecifies to reset the device before initialzing. IMPORTANT: It is only recommended to set this to
true
when testing.
bEnableMemoryTraceOptionally, specifies to enable the memory tracing (only supported in debug mode and dramatically slows down processing).

Definition at line 1293 of file CudaDnn.cs.

◆ CudaDnn() [2/2]

MyCaffe.common.CudaDnn< T >.CudaDnn ( CudaDnn< T >  cuda,
bool  bEnableGhostMemory 
)

Alternate CudaDnn constructor.

Parameters
cudaSpecifies an already created CudaDn instance. The internal Cuda Control of this instance is used by the new instance.
bEnableGhostMemorySpecifies to enable the ghost memory used to estimate GPU memory usage without allocating any GPU memory.

Definition at line 1392 of file CudaDnn.cs.

Member Function Documentation

◆ abs()

void MyCaffe.common.CudaDnn< T >.abs ( int  n,
long  hA,
long  hY 
)

Calculates the absolute value of A and places the result in Y.

Y = abs(X)

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6697 of file CudaDnn.cs.

◆ accuracy_fwd()

void MyCaffe.common.CudaDnn< T >.accuracy_fwd ( int  nCount,
long  hBottomData,
long  hBottomLabel,
long  hAccData,
int  nOuterNum,
int  nDim,
int  nInnerNum,
int  nNumLabels,
int  nTopK,
long  hCounts,
bool  bPerClass,
int?  nIgnoreLabel = null 
)

Performs the forward pass for the accuracy layer

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomLabelSpecifies a handle to the bottom labels in GPU memory.
hAccDataSpecifies a handle to temporary accuracy data in GPU memory.
nOuterNumSpecifies the outer count.
nDimSpecifies the dimension.
nInnerNumSpecifies the inner count.
nNumLabelsSpecifies the number of labels.
nTopKSpecifies the top items to include in the accuracy.
hCountsSpecifies a handle to the counts data in GPU memory.
bPerClassSpecifies whether (true) to caculate the accuracy for each class, or (false) globally.
nIgnoreLabelOptionally, specifies a label to ignore, or null to ignore.

Definition at line 7661 of file CudaDnn.cs.

◆ adadelta_update()

void MyCaffe.common.CudaDnn< T >.adadelta_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData1,
long  hHistoryData2,
fMomentum,
fDelta,
fLocalRate 
)

Perform the AdaDelta update

See ADADELTA: An Adaptive Learning Rate Method by Zeiler, 2012

Parameters
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryData1Specifies a handle to history data in GPU memory.
hHistoryData2Specifies a handle to history data in GPU memory.
fMomentumSpecifies the momentum to use.
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 8967 of file CudaDnn.cs.

◆ adagrad_update()

void MyCaffe.common.CudaDnn< T >.adagrad_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,
fDelta,
fLocalRate 
)

Perform the AdaGrad update

See Adaptive Subgradient Methods for Online Learning and Stochastic Optimization by Duchi, et al., 2011

Parameters
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 8946 of file CudaDnn.cs.

◆ adam_update()

void MyCaffe.common.CudaDnn< T >.adam_update ( int  nCount,
long  hNetParamsDiff,
long  hValM,
long  hValV,
fBeta1,
fBeta2,
fEpsHat,
fCorrectedLocalRate 
)

Perform the Adam update

See Adam: A Method for Stochastic Optimization by Kingma, et al., 2014

Parameters
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hValMNEEDS REVIEW
hValVNEEDS REVIEW
fBeta1NEEDS REVIEW
fBeta2NEEDS REVIEW
fEpsHatNEEDS REVIEW
fCorrectedLocalRateNEEDS REVIEW

Definition at line 8989 of file CudaDnn.cs.

◆ add() [1/4]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY 
)

Adds A to B and places the result in Y.

Y = A + B

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6487 of file CudaDnn.cs.

◆ add() [2/4]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
double  dfAlpha 
)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfAlphaSpecifies a scalar int type
double

Definition at line 6506 of file CudaDnn.cs.

◆ add() [3/4]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
double  dfAlphaA,
double  dfAlphaB,
int  nAOff = 0,
int  nBOff = 0,
int  nYOff = 0 
)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfAlphaASpecifies a scalar int type 'T' applied to A.
dfAlphaBSpecifies a scalar int type 'T' applied to B.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6548 of file CudaDnn.cs.

◆ add() [4/4]

void MyCaffe.common.CudaDnn< T >.add ( int  n,
long  hA,
long  hB,
long  hY,
float  fAlpha 
)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
fAlphaSpecifies a scalar int type
float

Definition at line 6525 of file CudaDnn.cs.

◆ add_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
double  fAlpha,
long  hY 
)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters
nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type
double
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6440 of file CudaDnn.cs.

◆ add_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
float  fAlpha,
long  hY 
)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters
nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type
float
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6454 of file CudaDnn.cs.

◆ add_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.add_scalar ( int  n,
fAlpha,
long  hY,
int  nYOff = 0 
)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters
nSpecifies the number of items (not bytes) in the vector Y.
fAlphaSpecifies the scalar value in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.
nYOffOptionally, specifies an offset into Y. The default is 0.

Definition at line 6469 of file CudaDnn.cs.

◆ AddTensor() [1/2]

void MyCaffe.common.CudaDnn< T >.AddTensor ( long  hCuDnn,
long  hSrcDesc,
long  hSrc,
int  nSrcOffset,
long  hDstDesc,
long  hDst,
int  nDstOffset 
)

Add two tensors together.

Parameters
hCuDnnSpecifies a handle to the cuDnn instance.
hSrcDescSpecifies a handle to the source tensor descriptor.
hSrcSpecifies a handle to the source GPU memory.
nSrcOffsetSpecifies an offset within the GPU memory.
hDstDescSpecifies a handle to the destination tensor descriptor.
hDstSpecifies a handle to the desination GPU memory.
nDstOffsetSpecifies an offset within the GPU memory.

Definition at line 3382 of file CudaDnn.cs.

◆ AddTensor() [2/2]

void MyCaffe.common.CudaDnn< T >.AddTensor ( long  hCuDnn,
fAlpha,
long  hSrcDesc,
long  hSrc,
int  nSrcOffset,
fBeta,
long  hDstDesc,
long  hDst,
int  nDstOffset 
)

Add two tensors together.

Parameters
hCuDnnSpecifies a handle to the cuDnn instance.
fAlphaSpecifies a scaling factor applied to the source GPU memory before the add.
hSrcDescSpecifies a handle to the source tensor descriptor.
hSrcSpecifies a handle to the source GPU memory.
nSrcOffsetSpecifies an offset within the GPU memory.
fBetaSpecifies a scaling factor applied to the destination GPU memory before the add.
hDstDescSpecifies a handle to the destination tensor descriptor.
hDstSpecifies a handle to the desination GPU memory.
nDstOffsetSpecifies an offset within the GPU memory.

Definition at line 3399 of file CudaDnn.cs.

◆ AllocHostBuffer()

long MyCaffe.common.CudaDnn< T >.AllocHostBuffer ( long  lCapacity)

Allocate a block of host memory with a specified capacity.

Parameters
lCapacitySpecifies the capacity to allocate (in items, not bytes).
Returns
The handle to the host memory is returned.

Definition at line 2333 of file CudaDnn.cs.

◆ AllocMemory() [1/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( double[]  rgSrc,
long  hStream = 0 
)

Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
rgSrcSpecifies an array of doubles to copy to the GPU.
hStreamOptionally specifies a stream to use for the copy.
Returns
The handle to the GPU memory is returned.

Definition at line 2092 of file CudaDnn.cs.

◆ AllocMemory() [2/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( float[]  rgSrc,
long  hStream = 0 
)

Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
rgSrcSpecifies an array of float to copy to the GPU.
hStreamOptionally specifies a stream to use for the copy.
Returns
The handle to the GPU memory is returned.

Definition at line 2104 of file CudaDnn.cs.

◆ AllocMemory() [3/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< double >  rg)

Allocate a block of GPU memory and copy a list of doubles to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
rgSpecifies a list of doubles to copy to the GPU.
Returns
The handle to the GPU memory is returned.

Definition at line 2069 of file CudaDnn.cs.

◆ AllocMemory() [4/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< float >  rg)

Allocate a block of GPU memory and copy a list of floats to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
rgSpecifies a list of floats to copy to the GPU.
Returns
The handle to the GPU memory is returned.

Definition at line 2080 of file CudaDnn.cs.

◆ AllocMemory() [5/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( long  lCapacity,
bool  bHalfSize = false 
)

Allocate a block of GPU memory with a specified capacity.

Parameters
lCapacitySpecifies the capacity to allocate (in items, not bytes).
bHalfSizeOptionally, specifies to use half size float memory - only available with the 'float' base type.
Returns
The handle to the GPU memory is returned.

Definition at line 2201 of file CudaDnn.cs.

◆ AllocMemory() [6/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( T[]  rgSrc,
long  hStream = 0,
bool  bHalfSize = false 
)

Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy.

Parameters
rgSrcSpecifies an array of 'T' to copy to the GPU.
hStreamOptionally, specifies a stream to use for the copy.
bHalfSizeOptionally, specifies to use half size float memory - only available with the 'float' base type.
Returns
The handle to the GPU memory is returned.

Definition at line 2116 of file CudaDnn.cs.

◆ AllocPCAData()

long MyCaffe.common.CudaDnn< T >.AllocPCAData ( int  nM,
int  nN,
int  nK,
out int  nCount 
)

Allocates the GPU memory for the PCA Data.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).
Returns

Definition at line 4887 of file CudaDnn.cs.

◆ AllocPCAEigenvalues()

long MyCaffe.common.CudaDnn< T >.AllocPCAEigenvalues ( int  nM,
int  nN,
int  nK,
out int  nCount 
)

Allocates the GPU memory for the PCA eigenvalues.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).
Returns

Definition at line 4938 of file CudaDnn.cs.

◆ AllocPCALoads()

long MyCaffe.common.CudaDnn< T >.AllocPCALoads ( int  nM,
int  nN,
int  nK,
out int  nCount 
)

Allocates the GPU memory for the PCA loads.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).
Returns

Definition at line 4921 of file CudaDnn.cs.

◆ AllocPCAScores()

long MyCaffe.common.CudaDnn< T >.AllocPCAScores ( int  nM,
int  nN,
int  nK,
out int  nCount 
)

Allocates the GPU memory for the PCA scores.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K <= N).
nCountReturns the total number of items in the allocated data (nM * nN).
Returns

Definition at line 4904 of file CudaDnn.cs.

◆ asum()

T MyCaffe.common.CudaDnn< T >.asum ( int  n,
long  hX,
int  nXOff = 0 
)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
Returns
the absolute value sum is returned as a type 'T'.

Definition at line 6279 of file CudaDnn.cs.

◆ asum_double()

double MyCaffe.common.CudaDnn< T >.asum_double ( int  n,
long  hX,
int  nXOff = 0 
)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
Returns
the absolute sum is returned as a type
double

Definition at line 6249 of file CudaDnn.cs.

◆ asum_float()

float MyCaffe.common.CudaDnn< T >.asum_float ( int  n,
long  hX,
int  nXOff = 0 
)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
Returns
the absolute sum is returned as a type
float

Definition at line 6264 of file CudaDnn.cs.

◆ axpby() [1/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
double  fAlpha,
long  hX,
double  fBeta,
long  hY 
)

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
double
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
double
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6019 of file CudaDnn.cs.

◆ axpby() [2/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
float  fAlpha,
long  hX,
float  fBeta,
long  hY 
)

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
float
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type
float
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6035 of file CudaDnn.cs.

◆ axpby() [3/3]

void MyCaffe.common.CudaDnn< T >.axpby ( int  n,
fAlpha,
long  hX,
fBeta,
long  hY 
)

Scale the vector x by Alpha and scale vector y by Beta and then add both together.

Y = (X * fAlpha) + (Y * fBeta)

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
fBetaSpecifies the scaling factor to apply to vector X, where the scaling factor is of type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6053 of file CudaDnn.cs.

◆ axpy() [1/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
double  fAlpha,
long  hX,
long  hY 
)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
double
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 5968 of file CudaDnn.cs.

◆ axpy() [2/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
float  fAlpha,
long  hX,
long  hY 
)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type
float
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 5983 of file CudaDnn.cs.

◆ axpy() [3/3]

void MyCaffe.common.CudaDnn< T >.axpy ( int  n,
fAlpha,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 
)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
fAlphaSpecifies the scalar to multiply where the scalar is of type 'T'.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6000 of file CudaDnn.cs.

◆ basetype_size()

static ulong MyCaffe.common.CudaDnn< T >.basetype_size ( bool  bUseHalfSize)
static

Returns the base type size in bytes.

Parameters
bUseHalfSizeSpecifies whether or not to use half size or the base size.

Definition at line 1677 of file CudaDnn.cs.

◆ BatchNormBackward()

void MyCaffe.common.CudaDnn< T >.BatchNormBackward ( long  hCuDnn,
BATCHNORM_MODE  mode,
fAlphaDiff,
fBetaDiff,
fAlphaParamDiff,
fBetaParamDiff,
long  hBwdBottomDesc,
long  hBottomData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDiffDesc,
long  hBottomDiff,
long  hBwdScaleBiasMeanVarDesc,
long  hScaleData,
long  hScaleDiff,
long  hBiasDiff,
double  dfEps,
long  hSaveMean,
long  hSaveInvVar 
)

Run the batch norm backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
modeSpecifies the batch normalization mode.
fAlphaDiffSpecifies the alpha value applied to the diff.
fBetaDiffSpecifies the beta value applied to the diff.
fAlphaParamDiffSpecifies the alpha value applied to the param diff.
fBetaParamDiffSpecifies the beta value applied to the param diff.
hBwdBottomDescSpecifies a handle to the backward bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data tensor.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff tensor.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff tensor.
hBwdScaleBiasMeanVarDescSpecifies a handle to the backward scale bias mean var descriptor.
hScaleDataSpecifies a handle to the scale data tensor.
hScaleDiffSpecifies a handle to the scale diff tensor.
hBiasDiffSpecifies a handle to the bias diff tensor.
dfEpsSpecifies the epsilon value.
hSaveMeanSpecifies a handle to the saved mean tensor.
hSaveInvVarSpecifies a handle to the saved variance tensor.

Definition at line 3932 of file CudaDnn.cs.

◆ BatchNormForward()

void MyCaffe.common.CudaDnn< T >.BatchNormForward ( long  hCuDnn,
BATCHNORM_MODE  mode,
fAlpha,
fBeta,
long  hFwdBottomDesc,
long  hBottomData,
long  hFwdTopDesc,
long  hTopData,
long  hFwdScaleBiasMeanVarDesc,
long  hScaleData,
long  hBiasData,
double  dfFactor,
long  hGlobalMean,
long  hGlobalVar,
double  dfEps,
long  hSaveMean,
long  hSaveInvVar,
bool  bTraining 
)

Run the batch norm forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
modeSpecifies the batch normalization mode.
fAlphaSpecifies the alpha value.
fBetaSpecifies the beta value.
hFwdBottomDescSpecifies a handle to the forward bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data tensor.
hFwdTopDescSpecifies a handle to the forward top tensor descriptor.
hTopDataSpecifies a handle to the top tensor.
hFwdScaleBiasMeanVarDescSpecifies a handle to the forward scale bias mean variance descriptor.
hScaleDataSpecifies a handle to the scale tensor.
hBiasDataSpecifies a handle to the bias tensor.
dfFactorSpecifies a scaling factor.
hGlobalMeanSpecifies a handle to the global mean tensor.
hGlobalVarSpecifies a handle to the global variance tensor.
dfEpsSpecifies the epsilon value to avoid dividing by zero.
hSaveMeanSpecifies a handle to the saved mean tensor.
hSaveInvVarSpecifies a handle to the saved variance tensor.
bTrainingSpecifies that this is a training pass when true, and a testing pass when false.

Definition at line 3902 of file CudaDnn.cs.

◆ batchreidx_bwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_bwd ( int  nCount,
int  nInnerDim,
long  hTopDiff,
long  hTopIdx,
long  hBegins,
long  hCounts,
long  hBottomDiff 
)

Performs the backward pass for batch re-index

Parameters
nCountSpecifies the number of items.
nInnerDimSpecifies the inner dimension.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopIdxSpecifies a handle to the top indexes in GPU memory.
hBeginsSpecifies a handle to the begin data in GPU memory.
hCountsSpecifies a handle to the counts in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 7706 of file CudaDnn.cs.

◆ batchreidx_fwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_fwd ( int  nCount,
int  nInnerDim,
long  hBottomData,
long  hPermutData,
long  hTopData 
)

Performs the forward pass for batch re-index

Parameters
nCountSpecifies the number of items.
nInnerDimSpecifies the inner dimension.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hPermutDataSpecifies a handle to the permuation data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 7688 of file CudaDnn.cs.

◆ bias_fwd()

void MyCaffe.common.CudaDnn< T >.bias_fwd ( int  nCount,
long  hBottomData,
long  hBiasData,
int  nBiasDim,
int  nInnerDim,
long  hTopData 
)

Performs a bias forward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hBiasDataSpecifies a handle to the bias data in GPU memory.
nBiasDimSpecifies the bias dimension.
nInnerDimNEEDS REVIEW
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8661 of file CudaDnn.cs.

◆ bnll_bwd()

void MyCaffe.common.CudaDnn< T >.bnll_bwd ( int  nCount,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff 
)

Performs a binomial normal log liklihod (BNLL) backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8287 of file CudaDnn.cs.

◆ bnll_fwd()

void MyCaffe.common.CudaDnn< T >.bnll_fwd ( int  nCount,
long  hBottomData,
long  hTopData 
)

Performs a binomial normal log liklihod (BNLL) forward pass in Cuda.

Computes $ f(x) = ln(1 + e^x) $

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8272 of file CudaDnn.cs.

◆ calc_dft_coefficients()

void MyCaffe.common.CudaDnn< T >.calc_dft_coefficients ( int  n,
long  hX,
int  m,
long  hY 
)

Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY.

Parameters
nSpecifies the number of items.
hXSpecifies a handle to the host memory holding the input values.
mSpecifies the number of items in hY, must = n/2 (Nyquest Limit)
hYSpecifies a handle to the host memory holding the n/2 output values (Nyquest Limit)
See also
Implement the Spectrogram from scratch in python by Yumi, Yumi's Blog, 2018

Definition at line 9650 of file CudaDnn.cs.

◆ calculate_batch_distances()

double[] MyCaffe.common.CudaDnn< T >.calculate_batch_distances ( DistanceMethod  distMethod,
double  dfThreshold,
int  nItemDim,
long  hSrc,
long  hTargets,
long  hWork,
int  rgOffsets[,] 
)

The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified.

Parameters
distMethodSpecifies the DistanceMethod to use (i.e. HAMMING or EUCLIDEAN).
dfThresholdSpecifies the threshold used when binarifying the values for the HAMMING distance. This parameter is ignored when calculating the EUCLIDEAN distance.
nItemDimSpecifies the dimension of a single item.
hSrcSpecifies the GPU memory containing the source items.
hTargetsSpecifies the GPU memory containing the target items that are compared against the source items.
hWorkSpecifies the GPU memory containing the work memory - this must be the same size as the maximum size of the src or targets.
rgOffsetsSpecifies the array of offset pairs where the first offset is into the source and the second is into the target.
Returns
The array distances corresponding to each offset pair is returned.

Definition at line 9669 of file CudaDnn.cs.

◆ channel_compare()

void MyCaffe.common.CudaDnn< T >.channel_compare ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 
)

Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory of length nOuterNum.

Definition at line 7283 of file CudaDnn.cs.

◆ channel_div()

void MyCaffe.common.CudaDnn< T >.channel_div ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
int  nMethod = 1 
)

Divides the values of the channels from X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nMethodSpecifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 7362 of file CudaDnn.cs.

◆ channel_dot()

void MyCaffe.common.CudaDnn< T >.channel_dot ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hA,
long  hY 
)

Calculates the dot product the the values within each channel of X and places the result in Y.

Parameters
nCountSpecifies the number of elements.
nOuterNumSpecifies the number of images.
nChannelsSpecifies the number of channels per image.
nInnerNumSpecifies the dimension of each image.
hXSpecifies a handle to the vector X in GPU memory.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7434 of file CudaDnn.cs.

◆ channel_fill()

void MyCaffe.common.CudaDnn< T >.channel_fill ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
int  nLabelDim,
long  hLabels,
long  hY 
)

Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels.

Parameters
nCountSpecifies the number of items in Y.
nOuterNumSpecifies the num of Y and Labels.
nChannelsSpecifies the channel size of Y and X.
nInnerNumSpecifies the spatial dimension of X and Y, but is normally 1.
hXSpecifies the GPU memory containing the encodings (usually centroids) of each label 0, ... max label.
nLabelDimSpecifies the dimension of the label channels. A value > 1 indicates that more than one label are stored per channel in which case only the first label is used.
hLabelsSpecifies the label ordering that determines how Y is filled using data from X.
hYSpecifies the GPU memory of the output data.

This function is used to fill a blob with data matching a set of labels. For example in a 3 item encoding based system with 4 labels: X = 4 channels of 3 items each (e.g. an encoding for each label). The values of hLabels show the ordering for which to fill hY with the labeled encodings. So if hLabels = 0, 2, 1, 3, 1, then Y = size { 5, 3, 1, 1 }, 5 items each with encoding sizes of 3 items which are then filled with the encoding at position 0, (for label 0), followed by the encoding for label 2, then 1, 3 and ending with the encoding for 1 as specified by the labels.

Definition at line 7310 of file CudaDnn.cs.

◆ channel_max()

void MyCaffe.common.CudaDnn< T >.channel_max ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 
)

Calculates the maximum value within each channel of X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7266 of file CudaDnn.cs.

◆ channel_min()

void MyCaffe.common.CudaDnn< T >.channel_min ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 
)

Calculates the minimum value within each channel of X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7249 of file CudaDnn.cs.

◆ channel_mul()

void MyCaffe.common.CudaDnn< T >.channel_mul ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY,
int  nMethod = 1 
)

Multiplies the values of the channels from X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nMethodSpecifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 7380 of file CudaDnn.cs.

◆ channel_mulv()

void MyCaffe.common.CudaDnn< T >.channel_mulv ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hA,
long  hX,
long  hC 
)

Multiplies the values in vector X by each channel in matrix A and places the result in matrix C.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hASpecifies a handle to the matrix X in GPU memory.
hXSpecifies a handle to the vector X in GPU memory (must be of length nInnerDim).
hCSpecifies a handle to the matrix C in GPU memory where the results are placed.

Definition at line 7398 of file CudaDnn.cs.

◆ channel_scale()

void MyCaffe.common.CudaDnn< T >.channel_scale ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hA,
long  hY 
)

Multiplies the values of the channels from X with the scalar values in B and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of items within X and B.
nChannelsSpecifies the number of channels per item of X and B.
nInnerNumSpecifies the dimension of each data item in X (B should have data dimension = 1).
hXSpecifies a handle to the vector X in GPU memory.
hASpecifies a handle to the vector B containing the scalar values, one per num * channel.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7416 of file CudaDnn.cs.

◆ channel_sub()

void MyCaffe.common.CudaDnn< T >.channel_sub ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 
)

Subtracts the values across the channels from X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7327 of file CudaDnn.cs.

◆ channel_sum()

void MyCaffe.common.CudaDnn< T >.channel_sum ( int  nCount,
int  nOuterNum,
int  nChannels,
int  nInnerNum,
long  hX,
long  hY 
)

Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y.

Parameters
nCountSpecifies the number of elements in X.
nOuterNumSpecifies the number of images within X.
nChannelsSpecifies the number of channels per image of X.
nInnerNumSpecifies the dimension of each image in X.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7344 of file CudaDnn.cs.

◆ CheckMemoryAttributes()

bool MyCaffe.common.CudaDnn< T >.CheckMemoryAttributes ( long  hSrc,
int  nSrcDeviceID,
long  hDst,
int  nDstDeviceID 
)

Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers.

Parameters
hSrcSpecifies the handle to the source memory.
nSrcDeviceIDSpecifies the device id where the source memory resides.
hDstSpecifies the handle to the destination memory.
nDstDeviceIDSpecifies the device id where the destination memory resides.
Returns
This function returns
true
when both devices support peer-to-peer communcation,
false
otherwise.

Definition at line 1938 of file CudaDnn.cs.

◆ clip_bwd()

void MyCaffe.common.CudaDnn< T >.clip_bwd ( int  nCount,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff,
fMin,
fMax 
)

Performs a Clip backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
fMinSpecifies the bottom value to clip to.
fMaxSpecifies the top value to clip to.

Definition at line 7892 of file CudaDnn.cs.

◆ clip_fwd()

void MyCaffe.common.CudaDnn< T >.clip_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
fMin,
fMax 
)

Performs a Clip forward pass in Cuda.

Calculation $ Y[i] = \max(min, \min(max,X[i])) $

Parameters
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
fMinSpecifies the bottom value to clip to.
fMaxSpecifies the top value to clip to.

Definition at line 7875 of file CudaDnn.cs.

◆ cll_bwd()

void MyCaffe.common.CudaDnn< T >.cll_bwd ( int  nCount,
int  nChannels,
double  dfMargin,
bool  bLegacyVersion,
double  dfAlpha,
long  hY,
long  hDiff,
long  hDistSq,
long  hBottomDiff 
)

Performs a contrastive loss layer backward pass in Cuda.

See Dimensionality Reduction by Learning an Invariant Mapping by Hadsel, et al., 2006

Parameters
nCountSpecifies the number of items.
nChannelsSpecifies the number of channels.
dfMarginSpecifies the margin to use. The default is 1.0.
bLegacyVersionWhen
false
the calculation proposed by Hadsell, et al., 2006 is used where $ (margin - d)^2 $, otherwise the legacy version is used where $ (margin - d^2) $. The default is
false
dfAlphaNEEDS REVIEW
hYSpecifies the Y data in GPU memory used to determine similar pairs.
hDiffSpecifies the diff in GPU memory.
hDistSqSpecifies the distance squared data in GPU memory.
hBottomDiffSpecifies the bottom diff in GPU memory.

Definition at line 8728 of file CudaDnn.cs.

◆ coeff_sub_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_bwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hTopDiff,
long  hBottomDiff 
)

Performs a coefficient sub backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9213 of file CudaDnn.cs.

◆ coeff_sub_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_fwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hBottom,
long  hTop 
)

Performs a coefficient sub foward pass in Cuda.

Parameters
nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottomSpecifies a handle to the bottom data in GPU memory.
hTopSpecifies a handle to the top data in GPU memory.

Definition at line 9194 of file CudaDnn.cs.

◆ coeff_sum_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_bwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hTopDiff,
long  hBottomDiff 
)

Performs a coefficient sum backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9176 of file CudaDnn.cs.

◆ coeff_sum_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_fwd ( int  nCount,
int  nDim,
int  nNumOffset,
double  dfCoeff,
long  hCoeffData,
long  hBottom,
long  hTop 
)

Performs a coefficient sum foward pass in Cuda.

Parameters
nCountSpecifies the number of items.
nDimSpecifies the dimension of the data where the data is sized 'num' x 'dim'.
nNumOffsetSpecifies the offset applied to the coefficent indexing.
dfCoeffSpecifies a primary coefficient value applied to each input before summing.
hCoeffDataOptionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottomSpecifies a handle to the bottom data in GPU memory.
hTopSpecifies a handle to the top data in GPU memory.

Definition at line 9157 of file CudaDnn.cs.

◆ col2im()

void MyCaffe.common.CudaDnn< T >.col2im ( long  hDataCol,
int  nDataColOffset,
int  nChannels,
int  nHeight,
int  nWidth,
int  nKernelH,
int  nKernelW,
int  nPadH,
int  nPadW,
int  nStrideH,
int  nStrideW,
int  nDilationH,
int  nDilationW,
long  hDataIm,
int  nDataImOffset 
)

Rearranges the columns into image blocks.

Parameters
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.
nChannelsSpecifies the number of channels in the image.
nHeightSpecifies the height of the image.
nWidthSpecifies the width of the image.
nKernelHSpecifies the kernel height.
nKernelWSpecifies the kernel width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nDilationHSpecifies the dilation along the height.
nDilationWSpecifies the dilation along the width.
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.

Definition at line 7208 of file CudaDnn.cs.

◆ col2im_nd()

void MyCaffe.common.CudaDnn< T >.col2im_nd ( long  hDataCol,
int  nDataColOffset,
int  nNumSpatialAxes,
int  nColCount,
int  nChannelAxis,
long  hImShape,
long  hColShape,
long  hKernelShape,
long  hPad,
long  hStride,
long  hDilation,
long  hDataIm,
int  nDataImOffset 
)

Rearranges the columns into image blocks.

Parameters
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.
nNumSpatialAxesSpecifies the number of spatial axes.
nColCountSpecifies the number of kernels.
nChannelAxisSpecifies the axis containing the channel.
hImShapeSpecifies a handle to the image shape data in GPU memory.
hColShapeSpecifies a handle to the column shape data in GPU memory.
hKernelShapeSpecifies a handle to the kernel shape data in GPU memory.
hPadSpecifies a handle to the pad data in GPU memory.
hStrideSpecifies a handle to the stride data in GPU memory.
hDilationSpecifies a handle to the dilation data in GPU memory.
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.

Definition at line 7232 of file CudaDnn.cs.

◆ compare_signs()

void MyCaffe.common.CudaDnn< T >.compare_signs ( int  n,
long  hA,
long  hB,
long  hY 
)

Compares the signs of each value in A and B and places the result in Y.

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6913 of file CudaDnn.cs.

◆ concat_bwd()

void MyCaffe.common.CudaDnn< T >.concat_bwd ( int  nCount,
long  hTopDiff,
int  nNumConcats,
int  nConcatInputSize,
int  nTopConcatAxis,
int  nBottomConcatAxis,
int  nOffsetConcatAxis,
long  hBottomDiff 
)

Performs a concat backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nNumConcatsSpecifies the number of concatenations.
nConcatInputSizeSpecifies the concatenation input size.
nTopConcatAxisNEEDS REVIEW
nBottomConcatAxisNEEDS REVIEW
nOffsetConcatAxisNEEDS REVIEW
hBottomDiffSpecifies a handle to the Bottom diff in GPU memory.

Definition at line 8572 of file CudaDnn.cs.

◆ concat_fwd()

void MyCaffe.common.CudaDnn< T >.concat_fwd ( int  nCount,
long  hBottomData,
int  nNumConcats,
int  nConcatInputSize,
int  nTopConcatAxis,
int  nBottomConcatAxis,
int  nOffsetConcatAxis,
long  hTopData 
)

Performs a concat forward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nNumConcatsSpecifies the number of concatenations.
nConcatInputSizeSpecifies the concatenation input size.
nTopConcatAxisSpecifies the top axis to concatenate.
nBottomConcatAxisNEEDS REVIEW
nOffsetConcatAxisNEEDS REVIEW
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8552 of file CudaDnn.cs.

◆ contains_point()

bool MyCaffe.common.CudaDnn< T >.contains_point ( int  n,
long  hMean,
long  hWidth,
long  hX,
long  hWork,
int  nXOff = 0 
)

Returns true if the point is contained within the bounds.

Parameters
nSpecifies the number of items.
hMeanSpecifies a handle to the mean values in GPU memory.
hWidthSpecifies a handle to the width values in GPU memory.
hXSpecifies a handle to the X values in GPU memory.
hWorkSpecifies a handle to the work data in GPU memory.
nXOffOptionally, specifies an offset into the X vector (default = 0).
Returns
If the X values are within the bounds, true is returned, otherwise false.

Definition at line 7112 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias ( long  hCuDnn,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hBiasDesc,
long  hBiasDiff,
int  nBiasOffset,
bool  bSyncStream = true 
)

Perform a convolution backward pass on the bias.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hBiasDescSpecifies a handle to the bias tensor descriptor.
hBiasDiffSpecifies a handle to the bias diff in GPU memory.
nBiasOffsetSpecifies an offset into the diff memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3642 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias ( long  hCuDnn,
fAlpha,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
fBeta,
long  hBiasDesc,
long  hBiasDiff,
int  nBiasOffset,
bool  bSyncStream = true 
)

Perform a convolution backward pass on the bias.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hBiasDescSpecifies a handle to the bias tensor descriptor.
hBiasDiffSpecifies a handle to the bias diff in GPU memory.
nBiasOffsetSpecifies an offset into the diff memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3660 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData ( long  hCuDnn,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
CONV_BWD_DATA_ALGO  algoBwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hBottomDesc,
long  hBottomDiff,
int  nBottomOffset,
bool  bSyncStream = true 
)

Perform a convolution backward pass on the data.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3740 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData ( long  hCuDnn,
fAlpha,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
CONV_BWD_DATA_ALGO  algoBwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
fBeta,
long  hBottomDesc,
long  hBottomDiff,
int  nBottomOffset,
bool  bSyncStream = true 
)

Perform a convolution backward pass on the data.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3766 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
CONV_BWD_FILTER_ALGO  algoBwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hFilterDesc,
long  hWeightDiff,
int  nWeightOffset,
bool  bSyncStream 
)

Perform a convolution backward pass on the filter.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightDiffSpecifies a handle to the weight diff in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3687 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter ( long  hCuDnn,
fAlpha,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hTopDesc,
long  hTopDiff,
int  nTopOffset,
long  hConvDesc,
CONV_BWD_FILTER_ALGO  algoBwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
fBeta,
long  hFilterDesc,
long  hWeightDiff,
int  nWeightOffset,
bool  bSyncStream = true 
)

Perform a convolution backward pass on the filter.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoBwdSpecifies the algorithm to use when performing the backward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hFilterDescSpecifies a handle to the filter descriptor.
hWeightDiffSpecifies a handle to the weight diff in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3713 of file CudaDnn.cs.

◆ ConvolutionForward() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward ( long  hCuDnn,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hConvDesc,
CONV_FWD_ALGO  algoFwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
long  hTopDesc,
long  hTopData,
int  nTopOffset,
bool  bSyncStream = true 
)

Perform a convolution forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoFwdSpecifies the algorithm to use for the foward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3597 of file CudaDnn.cs.

◆ ConvolutionForward() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward ( long  hCuDnn,
fAlpha,
long  hBottomDesc,
long  hBottomData,
int  nBottomOffset,
long  hFilterDesc,
long  hWeight,
int  nWeightOffset,
long  hConvDesc,
CONV_FWD_ALGO  algoFwd,
long  hWorkspace,
int  nWorkspaceOffset,
ulong  lWorkspaceSize,
fBeta,
long  hTopDesc,
long  hTopData,
int  nTopOffset,
bool  bSyncStream = true 
)

Perform a convolution forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
nBottomOffsetSpecifies an offset into the bottom memory (in items, not bytes).
hFilterDescSpecifies a handle to the filter descriptor.
hWeightSpecifies a handle to the weight data in GPU memory.
nWeightOffsetSpecifies an offset into the weight memory (in items, not bytes).
hConvDescSpecifies a handle to the convolution descriptor.
algoFwdSpecifies the algorithm to use for the foward operation.
hWorkspaceSpecifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffsetSpecifies an offset into the workspace memory.
lWorkspaceSizeSpecifies the size of the workspace memory (in bytes).
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top memory (in items, not bytes).
bSyncStreamOptionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3623 of file CudaDnn.cs.

◆ copy() [1/2]

void MyCaffe.common.CudaDnn< T >.copy ( int  nCount,
int  nNum,
int  nDim,
long  hSrc1,
long  hSrc2,
long  hDst,
long  hSimilar,
bool  bInvert = false 
)

Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0).

Parameters
nCountSpecifies the total data length of hSrc1, hSrc2 and hDst.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrc1Specifies a handle to the GPU memory of source 1.
hSrc2Specifies a handle to the GPU memory of source 2.
hDstSpecifies a handle to the GPU memory of the destination.
hSimilarSpecifies a handle to the GPU memory of the similar data.
bInvertOptionally, specifies whether or not to invert the similar values (e.g. copy when similar = 0 instead of similar = 1)

Definition at line 5488 of file CudaDnn.cs.

◆ copy() [2/2]

void MyCaffe.common.CudaDnn< T >.copy ( int  nCount,
long  hSrc,
long  hDst,
int  nSrcOffset = 0,
int  nDstOffset = 0,
long  hStream = -1,
bool?  bSrcHalfSizeOverride = null,
bool?  bDstHalfSizeOverride = null 
)

Copy data from one block of GPU memory to another.

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
nCountSpecifies the number of items (not bytes) to copy.
hSrcSpecifies a handle to GPU memory containing the source data.
hDstSpecifies a handle to GPU memory containing the destination data.
nSrcOffsetOptionally specifies the offset into the source data where the copying starts.
nDstOffsetOptionally specifies the offset into the destination data where the copying starts.
hStreamOptionally, specifies a handle to a stream to use for the operation.
bSrcHalfSizeOverrideOptionally, specifies and override for the half size state of the source (default = null, which is ignored).
bDstHalfSizeOverrideOptionally, specifies and override for the half size state of the destination (default = null, which is ignored).

Definition at line 5460 of file CudaDnn.cs.

◆ copy_batch()

void MyCaffe.common.CudaDnn< T >.copy_batch ( int  nCount,
int  nNum,
int  nDim,
long  hSrcData,
long  hSrcLbl,
int  nDstCount,
long  hDstCache,
long  hWorkDevData,
int  nLabelStart,
int  nLabelCount,
int  nCacheSize,
long  hCacheHostCursors,
long  hWorkDataHost 
)

Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data.

Parameters
nCountSpecifies the total data length of hSrc.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcDataSpecifies a handle to the GPU memory of source data.
hSrcLblSpecifies a handle to the GPU memory of source labels.
nDstCountSpecifies the total data length of the hDstCache
hDstCacheSpecifies a handle to the GPU memory of the destination cache.
hWorkDevDataSpecifies a handle to the GPU memory of the device work data that is the same size as the hDstCache.
nLabelStartSpecifies the first label of all possible labels.
nLabelCountSpecifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSizeSpecifies the size of each labeled data cache.
hCacheHostCursorsSpecifies a handle to host memmory (allocated using AllocateHostBuffer) containing the label cursors - there should be 'nLabelCount' cursors.
hWorkDataHostSpecifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length.

NOTE: The cache size must be set at a sufficient size that covers the maximum number items for any given label within a batch, otherwise cached items will be overwritten for items in the current batch.

Definition at line 5515 of file CudaDnn.cs.

◆ copy_expand()

void MyCaffe.common.CudaDnn< T >.copy_expand ( int  n,
int  nNum,
int  nDim,
long  hX,
long  hA 
)

Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row.

Parameters
nSpecifies the total number of items in the matrix 'A'
nNumSpecifies the total number of rows in the matrix 'A' and the total number of items in the vector 'X'.
nDimSpecifies the total number of columns in the matrix 'A'.
hXSpecifies the 'nNum' length vector to expand.
hASpecifies the 'nNum' x 'nDim' matrix.

Definition at line 5635 of file CudaDnn.cs.

◆ copy_sequence() [1/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence ( int  n,
long  hSrc,
int  nSrcStep,
int  nSrcStartIdx,
int  nCopyCount,
int  nCopyDim,
long  hDst,
int  nDstStep,
int  nDstStartIdx,
int  nSrcSpatialDim,
int  nDstSpatialDim,
int  nSrcSpatialDimStartIdx = 0,
int  nDstSpatialDimStartIdx = 0,
int  nSpatialDimCount = -1 
)

Copy a sequence from a source to a destination and allow for skip steps.

Parameters
nSpecifies the total number of items in src.
hSrcSpecifies a handle to the source GPU memory.
nSrcStepSpecifies the stepping used across the source.
nSrcStartIdxSpecifies the starting index into the source.
nCopyCountSpecifies the number of items to copy.
nCopyDimSpecifies the dimension to copy (which x spatial dim = total copy amount).
hDstSpecifies a handle to the destination GPU memory.
nDstStepSpecifies the steping used across the desination.
nDstStartIdxSpecifies the starting index where data is to be copied in the destination.
nSrcSpatialDimSpecifies the src spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nDstSpatialDimSpecifies the dst spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nSrcSpatialDimStartIdxOptionally, specifies the start index within the source spatial dim to start the copy (default = 0)
nDstSpatialDimStartIdxOptionally, specifies the start index within the destination spatial dim to start the copy (default = 0)
nSpatialDimCountOptionally, specifies the number of items to copy from within the spatial dim (default = -1, copy all)

Definition at line 5618 of file CudaDnn.cs.

◆ copy_sequence() [2/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence ( int  nK,
int  nNum,
int  nDim,
long  hSrcData,
long  hSrcLbl,
int  nSrcCacheCount,
long  hSrcCache,
int  nLabelStart,
int  nLabelCount,
int  nCacheSize,
long  hCacheHostCursors,
bool  bOutputLabels,
List< long >  rghTop,
List< int >  rgnTopCount,
long  hWorkDataHost,
bool  bCombinePositiveAndNegative = false,
int  nSeed = 0 
)

Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs.

Parameters
nKSpecifies the output type expected where: nK = 0, outputs to 2 tops (anchor and one negative), or nK > 0, outputs to 2 + nK tops (anchor, positive, nK negatives). The rghTop and rgnTopCount must be sized accordingly.
nNumSpecifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDimSpecifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcDataSpecifies a handle to the GPU memory of source data.
hSrcLblSpecifies a handle to the GPU memory of source labels.
nSrcCacheCountSpecifis the number of items in hSrcCache (nCacheSize * nLabelCount).
hSrcCacheSpecifies a handle to the cached labeled data.
nLabelStartSpecifies the first label of all possible labels.
nLabelCountSpecifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSizeSpecifies the size of each labeled data cache.
hCacheHostCursorsSpecifies a handle to host memmory containing the label cursors - there should be 'nLabelCount' cursors.
bOutputLabelsSpecifies whether or not to output labels or not. When true, one additional top is expected for the labels.
rghTopSpecifies a list of the GPU memory for each top item. The number of top items expected depends on the 'nK' value.
rgnTopCountSpecifies a list of the item count for each top item. The number of top items expected depends on the 'nK' value.
hWorkDataHostSpecifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length and must be the same hWorkDataHost passed to 'copy_batch'.
bCombinePositiveAndNegativeOptionally, specifies to combine the positive and negative items by alternating between each and placing both in Top[1], while also making sure the output labels reflect the alternation.
nSeedOptionally, specifies a seed for the random number generator (default = 0, which igores this parameter).

Receiving an error ERROR_BATCH_TOO_SMALL indicates that the batch size is too small and does not have enough labels to choose from. Each batch should have at least two instances of each labeled item.

NOTE: When 'nK' = 1 and 'bCombinePositiveAndNegative' = true, the label output has a dimension of 2, and and the tops used are as follows: top(0) = anchor; top(1) = alternating negative/positive, top(2) = labels if 'bOutputLabels' = true.

Definition at line 5548 of file CudaDnn.cs.

◆ CopyDeviceToHost()

void MyCaffe.common.CudaDnn< T >.CopyDeviceToHost ( long  lCount,
long  hGpuSrc,
long  hHostDst 
)

Copy from GPU memory to Host memory.

Parameters
lCountSpecifies the number of items (of base type each) to copy.
hGpuSrcSpecifies the GPU memory containing the source data.
hHostDstSpecifies the Host memory containing the host destination.

Definition at line 2306 of file CudaDnn.cs.

◆ CopyHostToDevice()

void MyCaffe.common.CudaDnn< T >.CopyHostToDevice ( long  lCount,
long  hHostSrc,
long  hGpuDst 
)

Copy from Host memory to GPU memory.

Parameters
lCountSpecifies the number of items (of base type each) to copy.
hHostSrcSpecifies the Host memory containing the host source data.
hGpuDstSpecifies the GPU memory containing the destination.

Definition at line 2320 of file CudaDnn.cs.

◆ CreateConvolutionDesc()

long MyCaffe.common.CudaDnn< T >.CreateConvolutionDesc ( )

Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn.

Returns
The convolution descriptor handle is returned.

Definition at line 3491 of file CudaDnn.cs.

◆ CreateCuDNN()

long MyCaffe.common.CudaDnn< T >.CreateCuDNN ( long  hStream = 0)

Create a new instance of NVIDIA's cuDnn.

Parameters
hStreamSpecifies a stream used by cuDnn.
Returns
The handle to cuDnn is returned.

Definition at line 3007 of file CudaDnn.cs.

◆ CreateDropoutDesc()

long MyCaffe.common.CudaDnn< T >.CreateDropoutDesc ( )

Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn.

Returns
The dropout descriptor handle is returned.

Definition at line 3944 of file CudaDnn.cs.

◆ CreateExtension()

long MyCaffe.common.CudaDnn< T >.CreateExtension ( string  strExtensionDllPath)

Create an instance of an Extension DLL.

Parameters
strExtensionDllPathSpecifies the file path to the extension DLL.
Returns
The handle to a new instance of Extension is returned.

Definition at line 3200 of file CudaDnn.cs.

◆ CreateFilterDesc()

long MyCaffe.common.CudaDnn< T >.CreateFilterDesc ( )

Create a new instance of a filter descriptor for use with NVIDIA's cuDnn.

Returns
The filter descriptor handle is returned.

Definition at line 3412 of file CudaDnn.cs.

◆ CreateImageOp()

long MyCaffe.common.CudaDnn< T >.CreateImageOp ( int  nNum,
double  dfBrightnessProb,
double  dfBrightnessDelta,
double  dfContrastProb,
double  dfContrastLower,
double  dfContrastUpper,
double  dfSaturationProb,
double  dfSaturationLower,
double  dfSaturationUpper,
long  lRandomSeed = 0 
)

Create a new ImageOp used to perform image operations on the GPU.

Parameters
nNumSpecifies the number of items (usually the blob.num).
dfBrightnessProbSpecifies the brightness probability [0,1].
dfBrightnessDeltaSpecifies the brightness delta.
dfContrastProbSpecifies the contrast probability [0,1]
dfContrastLowerSpecifies the contrast lower bound value.
dfContrastUpperSpecifies the contrast upper bound value.
dfSaturationProbSpecifies the saturation probability [0,1]
dfSaturationLowerSpecifies the saturation lower bound value.
dfSaturationUpperSpecifies the saturation upper bound value.
lRandomSeedOptionally, specifies the random seed or 0 to ignore (default = 0).
Returns
A handle to the ImageOp is returned.

Definition at line 2897 of file CudaDnn.cs.

◆ CreateLRNDesc()

long MyCaffe.common.CudaDnn< T >.CreateLRNDesc ( )

Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn.

Returns
The LRN descriptor handle is returned.

Definition at line 4049 of file CudaDnn.cs.

◆ CreateMemoryPointer()

long MyCaffe.common.CudaDnn< T >.CreateMemoryPointer ( long  hData,
long  lOffset,
long  lCount 
)

Creates a memory pointer into an already existing block of GPU memory.

Parameters
hDataSpecifies a handle to the GPU memory.
lOffsetSpecifies the offset into the GPU memory (in items, not bytes), where the pointer is to start.
lCountSpecifies the number of items (not bytes) in the 'virtual' memory block pointed to by the memory pointer.
Returns
A handle to the memory pointer is returned. Handles to memory poitners can be used like any other handle to GPU memory.

Definition at line 2772 of file CudaDnn.cs.

◆ CreateMemoryTest()

long MyCaffe.common.CudaDnn< T >.CreateMemoryTest ( out ulong  ulTotalNumBlocks,
out double  dfMemAllocatedInGB,
out ulong  ulMemStartAddr,
out ulong  ulBlockSize,
double  dfPctToAllocate = 1.0 
)

Creates a new memory test on the current GPU.

Parameters
ulTotalNumBlocksReturns the total number of blocks available to test.
dfMemAllocatedInGBReturns the total amount of allocated memory, specified in GB.
ulMemStartAddrReturns the start address of the memory test.
ulBlockSizeReturns the block size of the memory to be tested.
dfPctToAllocateSpecifies the percentage of avaiable memory to test, where 1.0 = 100%.
Returns
A handle to the memory test is returned.

Definition at line 2813 of file CudaDnn.cs.

◆ CreateNCCL()

long MyCaffe.common.CudaDnn< T >.CreateNCCL ( int  nDeviceId,
int  nCount,
int  nRank,
Guid  guid 
)

Create an instance of NVIDIA's NCCL 'Nickel'

Parameters
nDeviceIdSpecifies the device where this instance of NCCL is going to run.
nCountSpecifies the total number of NCCL instances used.
nRankSpecifies the zero-based rank of this instance of NCCL.
guidSpecifies the unique Guid for this isntance of NCCL.
Returns
The handle to a new instance of NCCL is returned.

Definition at line 3041 of file CudaDnn.cs.

◆ CreatePCA()

long MyCaffe.common.CudaDnn< T >.CreatePCA ( int  nMaxIterations,
int  nM,
int  nN,
int  nK,
long  hData,
long  hScoresResult,
long  hLoadsResult,
long  hResiduals = 0,
long  hEigenvalues = 0 
)

Creates a new PCA instance and returns the handle to it.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
nMaxIterationsSpecifies the number of iterations to run.
nMSpecifies the data width (number of rows).
nNSpecifies the data height (number of columns).
nKSpecifies the number of components (K less than or equal to N).
hDataSpecifies a handle to the data allocated using AllocatePCAData.
hScoresResultSpecifies a handle to the data allocated using AllocatePCAScores.
hLoadsResultSpecifies a handle to the data allocated using AllocatePCALoads.
hResidualsSpecifies a handle to the data allocated using AllocatePCAData.
hEigenvaluesSpecifies a handle to the data allocated using AllocatePCAEigenvalues.
Returns

Definition at line 4960 of file CudaDnn.cs.

◆ CreatePoolingDesc()

long MyCaffe.common.CudaDnn< T >.CreatePoolingDesc ( )

Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn.

Returns
The pooling descriptor handle is returned.

Definition at line 3778 of file CudaDnn.cs.

◆ CreateRnnDataDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDataDesc ( )

Create the RNN Data Descriptor.

Returns
A handle to the RNN Data descriptor is returned.

Definition at line 4389 of file CudaDnn.cs.

◆ CreateRnnDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDesc ( )

Create the RNN Descriptor.

Returns
A handle to the RNN descriptor is returned.

Definition at line 4470 of file CudaDnn.cs.

◆ CreateSSD()

long MyCaffe.common.CudaDnn< T >.CreateSSD ( int  nNumClasses,
bool  bShareLocation,
int  nLocClasses,
int  nBackgroundLabelId,
bool  bUseDiffcultGt,
SSD_MINING_TYPE  miningType,
SSD_MATCH_TYPE  matchType,
float  fOverlapThreshold,
bool  bUsePriorForMatching,
SSD_CODE_TYPE  codeType,
bool  bEncodeVariantInTgt,
bool  bBpInside,
bool  bIgnoreCrossBoundaryBbox,
bool  bUsePriorForNms,
SSD_CONF_LOSS_TYPE  confLossType,
SSD_LOC_LOSS_TYPE  locLossType,
float  fNegPosRatio,
float  fNegOverlap,
int  nSampleSize,
bool  bMapObjectToAgnostic,
bool  bNmsParam,
float?  fNmsThreshold = null,
int?  nNmsTopK = null,
float?  fNmsEta = null 
)

Create an instance of the SSD GPU support.

Parameters
nNumClassesSpecifies the number of classes.
bShareLocationSpecifies whether or not to share the location.
nLocClassesSpecifies the number of location classes.
nBackgroundLabelIdSpecifies the background label ID.
bUseDiffcultGtSpecifies whether or not to use difficult ground truths.
miningTypeSpecifies the mining type to use.
matchTypeSpecifies the matching method to use.
fOverlapThresholdSpecifies the overlap threshold for each box.
bUsePriorForMatchingSpecifies whether or not to use priors for matching.
codeTypeSpecifies the code type to use.
bEncodeVariantInTgtSpecifies whether or not to encode the variant in the target.
bBpInsideSpecifies whether or not the BP is inside or not.
bIgnoreCrossBoundaryBboxSpecifies whether or not to ignore cross boundary boxes.
bUsePriorForNmsSpecifies whether or not to use priors for NMS.
confLossTypeSpecifies the confidence loss type.
locLossTypeSpecifies the location loss type.
fNegPosRatioSpecifies the negative/positive ratio to use.
fNegOverlapSpecifies the negative overlap to use.
nSampleSizeSpecifies the sample size.
bMapObjectToAgnosticSpecifies whether or not to map objects to agnostic or not.
bNmsParamSpecifies whether or not the NMS parameters are specified.
fNmsThresholdSpecifies the NMS threshold, which is only used when the 'bNmsParam' = true.
nNmsTopKSpecifies the NMS top-k selection, which is only used when the 'bNmsParam' = true.
fNmsEtaSpecifies the NMS eta, which is only used when the 'bNmsParam' = true.
Returns
A handle to the SSD instance is returned.

Definition at line 5050 of file CudaDnn.cs.

◆ CreateStream()

long MyCaffe.common.CudaDnn< T >.CreateStream ( bool  bNonBlocking = false,
int  nIndex = -1 
)

Create a new stream on the current GPU.

Parameters
bNonBlockingWhen
false
(the default) the created stream is a 'blocking' stream, otherwise it is an asynchronous, non-blocking stream.
nIndexSpecifies an index for the stream where indexed streams are shared when the index = 0 or greater.
Returns
The handle to the stream is returned.

Definition at line 2953 of file CudaDnn.cs.

◆ CreateTensorDesc()

long MyCaffe.common.CudaDnn< T >.CreateTensorDesc ( )

Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn.

Returns
The tensor descriptor handle is returned.

Definition at line 3262 of file CudaDnn.cs.

◆ crop_bwd()

void MyCaffe.common.CudaDnn< T >.crop_bwd ( int  nCount,
int  nNumAxes,
long  hSrcStrides,
long  hDstStrides,
long  hOffsets,
long  hBottomDiff,
long  hTopDiff 
)

Performs the crop backward operation.

Parameters
nCountSpecifies the count.
nNumAxesSpecifies the number of axes in the bottom.
hSrcStridesSpecifies a handle to the GPU memory containing the source strides.
hDstStridesSpecifies a handle to the GPU memory containing the destination strides.
hOffsetsSpecifies a handle to the GPU memory containing the offsets.
hBottomDiffSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top data in GPU memory.

Definition at line 8533 of file CudaDnn.cs.

◆ crop_fwd()

void MyCaffe.common.CudaDnn< T >.crop_fwd ( int  nCount,
int  nNumAxes,
long  hSrcStrides,
long  hDstStrides,
long  hOffsets,
long  hBottomData,
long  hTopData 
)

Performs the crop forward operation.

Parameters
nCountSpecifies the count.
nNumAxesSpecifies the number of axes in the bottom.
hSrcStridesSpecifies a handle to the GPU memory containing the source strides.
hDstStridesSpecifies a handle to the GPU memory containing the destination strides.
hOffsetsSpecifies a handle to the GPU memory containing the offsets.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8515 of file CudaDnn.cs.

◆ cross_entropy_fwd()

void MyCaffe.common.CudaDnn< T >.cross_entropy_fwd ( int  nCount,
long  hInput,
long  hTarget,
long  hLoss,
bool  bHasIgnoreLabel,
int  nIgnoreLabel,
long  hCountData 
)

Performs a sigmoid cross entropy forward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hInputSpecifies a handle to the input data in GPU memory.
hTargetSpecifies a handle to the target data in GPU memory.
hLossSpecifies a handle to the loss data in GPU memory.
bHasIgnoreLabelSpecifies whether or not an ignore label is used.
nIgnoreLabelSpecifies the ignore label which is used when bHasIgnoreLabel is
true
hCountDataSpecifies a handle to the count data in GPU memory.

Definition at line 9232 of file CudaDnn.cs.

◆ cross_entropy_ignore()

void MyCaffe.common.CudaDnn< T >.cross_entropy_ignore ( int  nCount,
int  nIgnoreLabel,
long  hTarget,
long  hBottomDiff 
)

Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified.

Parameters
nCountSpecifies the number of items.
nIgnoreLabelSpecifies the label to ignore.
hTargetSpecifies a handle to the target data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 9247 of file CudaDnn.cs.

◆ debug()

void MyCaffe.common.CudaDnn< T >.debug ( )

The debug function is uses only during debugging the debug version of the low-level DLL.

Definition at line 9260 of file CudaDnn.cs.

◆ denan()

void MyCaffe.common.CudaDnn< T >.denan ( int  n,
long  hX,
double  dfReplacement 
)

Replaces all NAN values witin X with a replacement value.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
hXSpecifies a handle to the vector X in GPU memory.
dfReplacementSpecifies the replacement value.

Definition at line 7132 of file CudaDnn.cs.

◆ DeriveBatchNormDesc()

void MyCaffe.common.CudaDnn< T >.DeriveBatchNormDesc ( long  hFwdScaleBiasMeanVarDesc,
long  hFwdBottomDesc,
long  hBwdScaleBiasMeanVarDesc,
long  hBwdBottomDesc,
BATCHNORM_MODE  mode 
)

Derive the batch norm descriptors for both the forward and backward passes.

Parameters
hFwdScaleBiasMeanVarDescSpecifies a handle to the scale bias mean var tensor descriptor for the forward pass.
hFwdBottomDescSpecifies a handle to the forward bottom tensor descriptor.
hBwdScaleBiasMeanVarDescSpecifies a handle to the scale bias mean var tensor descriptor for the backward pass.
hBwdBottomDescSpecifies a handle to the backward bottom tensor descriptor.
mode

Definition at line 3873 of file CudaDnn.cs.

◆ DeviceCanAccessPeer()

bool MyCaffe.common.CudaDnn< T >.DeviceCanAccessPeer ( int  nSrcDeviceID,
int  nPeerDeviceID 
)

Query whether or not two devices can access each other via peer-to-peer memory copies.

Parameters
nSrcDeviceIDSpecifies the device id of the source.
nPeerDeviceIDSpecifies the device id of the peer to the source device.
Returns
true
is returned if the source device can access the peer device via peer-to-peer communcation,
false
otherwise.

Definition at line 2018 of file CudaDnn.cs.

◆ DeviceDisablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceDisablePeerAccess ( int  nPeerDeviceID)

Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

Parameters
nPeerDeviceIDSpecifies the device id of the peer device.

Definition at line 2048 of file CudaDnn.cs.

◆ DeviceEnablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceEnablePeerAccess ( int  nPeerDeviceID)

Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

Parameters
nPeerDeviceIDSpecifies the device id of the peer device.

Definition at line 2036 of file CudaDnn.cs.

◆ DisableGhostMemory()

void MyCaffe.common.CudaDnn< T >.DisableGhostMemory ( )

Disables the ghost memory, if enabled.

Definition at line 1553 of file CudaDnn.cs.

◆ Dispose() [1/2]

void MyCaffe.common.CudaDnn< T >.Dispose ( )

Disposes this instance freeing up all of its host and GPU memory.

Definition at line 1434 of file CudaDnn.cs.

◆ Dispose() [2/2]

virtual void MyCaffe.common.CudaDnn< T >.Dispose ( bool  bDisposing)
protectedvirtual

Disposes this instance freeing up all of its host and GPU memory.

Parameters
bDisposingWhen true, specifies that the call is from a Dispose call.

Definition at line 1417 of file CudaDnn.cs.

◆ DistortImage()

void MyCaffe.common.CudaDnn< T >.DistortImage ( long  h,
int  nCount,
int  nNum,
int  nDim,
long  hX,
long  hY 
)

Distort an image using the ImageOp handle provided.

Parameters
hSpecifies a handle to the ImageOp that defines how the image will be distorted.
nCountSpecifies the total number of data elements within 'hX' and 'hY'.
nNumSpecifies the number of items to be distorted (typically blob.num) in 'hX' and 'hY'.
nDimSpecifies the dimension of each item.
hXSpecifies a handle to the GPU memory containing the source data to be distorted.
hYSpecifies a handle to the GPU memory containing the destination of the distortion.

Definition at line 2932 of file CudaDnn.cs.

◆ div()

void MyCaffe.common.CudaDnn< T >.div ( int  n,
long  hA,
long  hB,
long  hY 
)

Divides each element of A by each element of B and places the result in Y.

Y = A / B (element by element)

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6680 of file CudaDnn.cs.

◆ divbsx()

void MyCaffe.common.CudaDnn< T >.divbsx ( int  n,
long  hA,
int  nAOff,
long  hX,
int  nXOff,
int  nC,
int  nSpatialDim,
bool  bTranspose,
long  hB,
int  nBOff 
)

Divide a matrix by a vector.

Parameters
nSpecifies the number of items.
hASpecifies the matrix to divide.
nAOffSpecifies the offset to apply to the GPU memory of hA.
hXSpecifies the divisor vector.
nXOffSpecifies the offset to apply to the GPU memory of hX.
nCSpecifies the number of channels.
nSpatialDimSpecifies the spatial dimension.
bTransposeSpecifies whether or not to transpose the matrix.
hBSpecifies the output matrix.
nBOffSpecifies the offset to apply to the GPU memory of hB.

Definition at line 6095 of file CudaDnn.cs.

◆ DivisiveNormalizationBackward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationBackward ( long  hCuDnn,
long  hNormDesc,
fAlpha,
long  hBottomDataDesc,
long  hBottomData,
long  hTopDiff,
long  hTemp1,
long  hTemp2,
fBeta,
long  hBottomDiffDesc,
long  hBottomDiff 
)

Performs a Devisive Normalization backward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTemp1Temporary data in GPU memory.
hTemp2Temporary data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4174 of file CudaDnn.cs.

◆ DivisiveNormalizationForward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationForward ( long  hCuDnn,
long  hNormDesc,
fAlpha,
long  hBottomDataDesc,
long  hBottomData,
long  hTemp1,
long  hTemp2,
fBeta,
long  hTopDataDesc,
long  hTopData 
)

Performs a Devisive Normalization forward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTemp1Temporary data in GPU memory.
hTemp2Temporary data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4149 of file CudaDnn.cs.

◆ dot()

T MyCaffe.common.CudaDnn< T >.dot ( int  n,
long  hX,
long  hY,
int  nXOff = 0,
int  nYOff = 0 
)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nXOffOptionally, specifies an offset (in items, not bytes) into the memory of X.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.
Returns
The dot product is returned as a type 'T'.

Definition at line 6225 of file CudaDnn.cs.

◆ dot_double()

double MyCaffe.common.CudaDnn< T >.dot_double ( int  n,
long  hX,
long  hY 
)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
Returns
The dot product is returned as a type
double

Definition at line 6193 of file CudaDnn.cs.

◆ dot_float()

float MyCaffe.common.CudaDnn< T >.dot_float ( int  n,
long  hX,
long  hY 
)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters
nSpecifies the number of items (not bytes) in the vector X and Y.
hXSpecifies a handle to the vector X in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
Returns
The dot product is returned as a type
float

Definition at line 6208 of file CudaDnn.cs.

◆ dropout_bwd()

void MyCaffe.common.CudaDnn< T >.dropout_bwd ( int  nCount,
long  hTopDiff,
long  hMask,
uint  uiThreshold,
fScale,
long  hBottomDiff 
)

Performs a dropout backward pass in Cuda.

See also
Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012
Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
uiThresholdSpecifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScaleSpecifies a scale value applied to each item that is not dropped out.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8255 of file CudaDnn.cs.

◆ dropout_fwd()

void MyCaffe.common.CudaDnn< T >.dropout_fwd ( int  nCount,
long  hBottomData,
long  hMask,
uint  uiThreshold,
fScale,
long  hTopData 
)

Performs a dropout forward pass in Cuda.

See also
Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012
Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
uiThresholdSpecifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScaleSpecifies a scale value applied to each item that is not dropped out.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8235 of file CudaDnn.cs.

◆ DropoutBackward()

void MyCaffe.common.CudaDnn< T >.DropoutBackward ( long  hCuDnn,
long  hDropoutDesc,
long  hTopDesc,
long  hTop,
long  hBottomDesc,
long  hBottom,
long  hReserved 
)

Performs a dropout backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hDropoutDescSpecifies a handle to the dropout descriptor.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopSpecifies a handle to the top data in GPU memory.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomSpecifies a handle to the bottom data in GPU memory.
hReservedSpecifies a handle to the reseved data in GPU memory.

Definition at line 4037 of file CudaDnn.cs.

◆ DropoutForward()

void MyCaffe.common.CudaDnn< T >.DropoutForward ( long  hCuDnn,
long  hDropoutDesc,
long  hBottomDesc,
long  hBottomData,
long  hTopDesc,
long  hTopData,
long  hReserved 
)

Performs a dropout forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hDropoutDescSpecifies a handle to the dropout descriptor.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hReservedSpecifies a handle to the reseved data in GPU memory.

Definition at line 4019 of file CudaDnn.cs.

◆ elu_bwd()

void MyCaffe.common.CudaDnn< T >.elu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomData,
long  hBottomDiff,
double  dfAlpha 
)

Performs a Exponential Linear Unit (ELU) backward pass in Cuda.

See also
Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016
Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
dfAlphaSpecifies the alpha value.

Definition at line 8215 of file CudaDnn.cs.

◆ elu_fwd()

void MyCaffe.common.CudaDnn< T >.elu_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
double  dfAlpha 
)

Performs a Exponential Linear Unit (ELU) forward pass in Cuda.

Calculates $ f(x) = (x > 0) ? x : \alpha * (e^x - 1) $

See also
Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016
Parameters
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
dfAlphaSpecifies the alpha value.

Definition at line 8195 of file CudaDnn.cs.

◆ EluBackward()

void MyCaffe.common.CudaDnn< T >.EluBackward ( long  hCuDnn,
fAlpha,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hBottomDiffDesc,
long  hBottomDiff 
)

Perform a Elu backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4254 of file CudaDnn.cs.

◆ EluForward()

void MyCaffe.common.CudaDnn< T >.EluForward ( long  hCuDnn,
fAlpha,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hTopDataDesc,
long  hTopData 
)

Perform a Elu forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4232 of file CudaDnn.cs.

◆ embed_bwd()

void MyCaffe.common.CudaDnn< T >.embed_bwd ( int  nCount,
long  hBottomData,
long  hTopDiff,
int  nM,
int  nN,
int  nK,
long  hWeightDiff 
)

Performs the backward pass for embed

Parameters
nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nMNEEDS REVIEW
nNNEEDS REVIEW
nKNEEDS REVIEW
hWeightDiffSpecifies a handle to the weight diff in GPU memory.

Definition at line 7742 of file CudaDnn.cs.

◆ embed_fwd()

void MyCaffe.common.CudaDnn< T >.embed_fwd ( int  nCount,
long  hBottomData,
long  hWeight,
int  nM,
int  nN,
int  nK,
long  hTopData 
)

Performs the forward pass for embed

Parameters
nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hWeightSpecifies a handle to the weight data in GPU memory.
nMNEEDS REVIEW
nNNEEDS REVIEW
nKNEEDS REVIEW
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 7724 of file CudaDnn.cs.

◆ erf() [1/3]

double MyCaffe.common.CudaDnn< T >.erf ( double  dfVal)

Calculates the erf() function.

Parameters
dfValSpecifies the input value.
Returns
The erf result is returned.

Definition at line 6364 of file CudaDnn.cs.

◆ erf() [2/3]

float MyCaffe.common.CudaDnn< T >.erf ( float  fVal)

Calculates the erf() function.

Parameters
fValSpecifies the input value.
Returns
The erf result is returned.

Definition at line 6374 of file CudaDnn.cs.

◆ erf() [3/3]

T MyCaffe.common.CudaDnn< T >.erf ( fVal)

Calculates the erf() function.

Parameters
fValSpecifies the input value.
Returns
The erf result is returned.

Definition at line 6384 of file CudaDnn.cs.

◆ exp() [1/2]

void MyCaffe.common.CudaDnn< T >.exp ( int  n,
long  hA,
long  hY 
)

Calculates the exponent value of A and places the result in Y.

$ f(x) = exp(x) $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6714 of file CudaDnn.cs.

◆ exp() [2/2]

void MyCaffe.common.CudaDnn< T >.exp ( int  n,
long  hA,
long  hY,
int  nAOff,
int  nYOff,
double  dfBeta 
)

Calculates the exponent value of A * beta and places the result in Y.

$ f(x) = exp(x * \beta) $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffSpecifies an offset (in items, not bytes) into the memory of A.
nYOffSpecifies an offset (in items, not bytes) into the memory of Y.
dfBetaSpecifies the scalar as type
double

Definition at line 6731 of file CudaDnn.cs.

◆ fill()

void MyCaffe.common.CudaDnn< T >.fill ( int  n,
int  nDim,
long  hSrc,
int  nSrcOff,
int  nCount,
long  hDst 
)

Fill data from the source data 'n' times in the destination.

Parameters
nSpecifies the number of times to copy the source data.
nDimSpecifies the number of source items to copy.
hSrcSpecifies a handle to the GPU memory of the source data.
nSrcOffSpecifies an offset into the GPU memory where the source data copy starts.
nCountSpecifies the total number of items in the destination. This value must be >= n * nDim.
hDstSpecifies the handle to the GPU memory where the data is to be copied.

Definition at line 5652 of file CudaDnn.cs.

◆ FreeConvolutionDesc()

void MyCaffe.common.CudaDnn< T >.FreeConvolutionDesc ( long  h)

Free a convolution descriptor instance.

Parameters
hSpecifies the handle to the convolution descriptor instance.

Definition at line 3509 of file CudaDnn.cs.

◆ FreeCuDNN()

void MyCaffe.common.CudaDnn< T >.FreeCuDNN ( long  h)

Free an instance of cuDnn.

Parameters
hSpecifies the handle to cuDnn.

Definition at line 3025 of file CudaDnn.cs.

◆ FreeDropoutDesc()

void MyCaffe.common.CudaDnn< T >.FreeDropoutDesc ( long  h)

Free a dropout descriptor instance.

Parameters
hSpecifies the handle to the dropout descriptor instance.

Definition at line 3962 of file CudaDnn.cs.

◆ FreeExtension()

void MyCaffe.common.CudaDnn< T >.FreeExtension ( long  hExtension)

Free an instance of an Extension.

Parameters
hExtensionSpecifies the handle to the Extension.

Definition at line 3218 of file CudaDnn.cs.

◆ FreeFilterDesc()

void MyCaffe.common.CudaDnn< T >.FreeFilterDesc ( long  h)

Free a filter descriptor instance.

Parameters
hSpecifies the handle to the filter descriptor instance.

Definition at line 3430 of file CudaDnn.cs.

◆ FreeHostBuffer()

void MyCaffe.common.CudaDnn< T >.FreeHostBuffer ( long  hMem)

Free previously allocated host memory.

Parameters
hMemSpecifies the handle to the host memory.

Definition at line 2354 of file CudaDnn.cs.

◆ FreeImageOp()

void MyCaffe.common.CudaDnn< T >.FreeImageOp ( long  h)

Free an image op, freeing up all GPU memory used.

Parameters
hSpecifies the handle to the image op.

Definition at line 2915 of file CudaDnn.cs.

◆ FreeLRNDesc()

void MyCaffe.common.CudaDnn< T >.FreeLRNDesc ( long  h)

Free a LRN descriptor instance.

Parameters
hSpecifies the handle to the LRN descriptor instance.

Definition at line 4067 of file CudaDnn.cs.

◆ FreeMemory()

void MyCaffe.common.CudaDnn< T >.FreeMemory ( long  hMem)

Free previously allocated GPU memory.

Parameters
hMemSpecifies the handle to the GPU memory.

Definition at line 2269 of file CudaDnn.cs.

◆ FreeMemoryPointer()

void MyCaffe.common.CudaDnn< T >.FreeMemoryPointer ( long  hData)

Frees a memory pointer.

Parameters
hDataSpecifies the handle to the memory pointer.

Definition at line 2790 of file CudaDnn.cs.

◆ FreeMemoryTest()

void MyCaffe.common.CudaDnn< T >.FreeMemoryTest ( long  h)

Free a memory test, freeing up all GPU memory used.

Parameters
hSpecifies the handle to the memory test.

Definition at line 2839 of file CudaDnn.cs.

◆ FreeNCCL()

void MyCaffe.common.CudaDnn< T >.FreeNCCL ( long  hNccl)

Free an instance of NCCL.

Parameters
hNcclSpecifies the handle to NCCL.

Definition at line 3099 of file CudaDnn.cs.

◆ FreePCA()

void MyCaffe.common.CudaDnn< T >.FreePCA ( long  hPCA)

Free the PCA instance associated with handle.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters
hPCASpecifies a handle to the PCA instance to free.

Definition at line 5014 of file CudaDnn.cs.

◆ FreePoolingDesc()

void MyCaffe.common.CudaDnn< T >.FreePoolingDesc ( long  h)

Free a pooling descriptor instance.

Parameters
hSpecifies the handle to the pooling descriptor instance.

Definition at line 3796 of file CudaDnn.cs.

◆ FreeRnnDataDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDataDesc ( long  h)

Free an existing RNN Data descriptor.

Parameters
hSpecifies the handle to the RNN Data descriptor created with CreateRnnDataDesc

Definition at line 4409 of file CudaDnn.cs.

◆ FreeRnnDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDesc ( long  h)

Free an existing RNN descriptor.

Parameters
hSpecifies the handle to the RNN descriptor created with CreateRnnDesc

Definition at line 4488 of file CudaDnn.cs.

◆ FreeSSD()

void MyCaffe.common.CudaDnn< T >.FreeSSD ( long  hSSD)

Free the instance of SSD GPU support.

Parameters
hSSDSpecifies the handle to the SSD instance.

Definition at line 5155 of file CudaDnn.cs.

◆ FreeStream()

void MyCaffe.common.CudaDnn< T >.FreeStream ( long  h)

Free a stream.

Parameters
hSpecifies the handle to the stream.

Definition at line 2971 of file CudaDnn.cs.

◆ FreeTensorDesc()

void MyCaffe.common.CudaDnn< T >.FreeTensorDesc ( long  h)

Free a tensor descriptor instance.

Parameters
hSpecifies the handle to the tensor descriptor instance.

Definition at line 3280 of file CudaDnn.cs.

◆ gather_bwd()

void MyCaffe.common.CudaDnn< T >.gather_bwd ( int  nCount,
long  hTop,
long  hBottom,
int  nAxis,
int  nDim,
int  nDimAtAxis,
int  nM,
int  nN,
long  hIdx 
)

Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data.

Parameters
nCountSpecifies the number of items.
hTopSpecifies the input data.
hBottomSpecifies the output data.
nAxisSpecifies the axis along which to copy.
nDimSpecifies the dimension of each item at each index.
nDimAtAxisSpecifies the dimension at the axis.
nMSpecifies the M dimension.
nNSpecifies the M dimension.
hIdxSpecifies the indexes of the data to gather.

Definition at line 8825 of file CudaDnn.cs.

◆ gather_fwd()

void MyCaffe.common.CudaDnn< T >.gather_fwd ( int  nCount,
long  hBottom,
long  hTop,
int  nAxis,
int  nDim,
int  nDimAtAxis,
int  nM,
int  nN,
long  hIdx 
)

Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data.

Parameters
nCountSpecifies the number of items.
hBottomSpecifies the input data.
hTopSpecifies the output data.
nAxisSpecifies the axis along which to copy.
nDimSpecifies the dimension of each item at each index.
nDimAtAxisSpecifies the dimension at the axis.
nMSpecifies the M dimension.
nNSpecifies the M dimension.
hIdxSpecifies the indexes of the data to gather.

Definition at line 8805 of file CudaDnn.cs.

◆ gaussian_blur()

void MyCaffe.common.CudaDnn< T >.gaussian_blur ( int  n,
int  nChannels,
int  nHeight,
int  nWidth,
double  dfSigma,
long  hX,
long  hY 
)

The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma.

The gaussian blur operation runs a 3x3 patch, initialized with the gaussian distribution using the formula $ G(x, y) = \frac{1}{{2\pi\sigma^2 }}e^{{{ - \left( {x^2 - y^2 } \right) } \mathord{\left/ {\vphantom {{ - \left( {x^2 - y^2 } \right) } {2\sigma ^2 }}} \right. \kern-\nulldelimiterspace} {2\sigma ^2 }}} $

See also
Gaussian Blur on Wikipedia for more information.
Parameters
nSpecifies the number of items in the memory of 'X'.
nChannelsSpecifies the number of channels (i.e. 3 for RGB, 1 for B/W).
nHeightSpecifies the height of each item.
nWidthSpecifies the width of each item.
dfSigmaSpecifies the sigma used in the gaussian blur.
hXSpecifies a handle to GPU memory containing the source data to blur.
hYSpecifies a handle to GPU memory where the blurred information is placed.

Definition at line 9603 of file CudaDnn.cs.

◆ geam() [1/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC 
)

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A, B and C.
nSpecifies the height (number of rows) of A, B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
double
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
double
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 5788 of file CudaDnn.cs.

◆ geam() [2/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
float  fAlpha,
long  hA,
long  hB,
float  fBeta,
long  hC 
)

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A, B and C.
nSpecifies the height (number of rows) of A, B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
double
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
double
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 5808 of file CudaDnn.cs.

◆ geam() [3/3]

void MyCaffe.common.CudaDnn< T >.geam ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
fAlpha,
long  hA,
long  hB,
fBeta,
long  hC,
int  nAOffset = 0,
int  nBOffset = 0,
int  nCOffset = 0 
)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nBOffsetSpecifies an offset (in items, not bytes) into the memory of B.
nCOffsetSpecifies an offset (in items, not bytes) into the memory of C.

Definition at line 5831 of file CudaDnn.cs.

◆ gemm() [1/4]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC 
)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
double
hASpecifies a handle to the data for A in GPU memory.
hBSpecifies a handle to the data for B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
double
hCSpecifies a handle to the data for C in GPU memory.

Definition at line 5689 of file CudaDnn.cs.

◆ gemm() [2/4]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
double  fAlpha,
long  hA,
long  hB,
double  fBeta,
long  hC,
uint  lda,
uint  ldb,
uint  ldc 
)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
ldaSpecifies the leading dimension of A.
ldbSpecifies the leading dimension of B.
ldcSpecifies the leading dimension of C.

Definition at line 5765 of file CudaDnn.cs.

◆ gemm() [3/4]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
float  fAlpha,
long  hA,
long  hB,
float  fBeta,
long  hC 
)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
float
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type
float
hCSpecifies a handle to the data for matrix C in GPU memory.

Definition at line 5710 of file CudaDnn.cs.

◆ gemm() [4/4]

void MyCaffe.common.CudaDnn< T >.gemm ( bool  bTransA,
bool  bTransB,
int  m,
int  n,
int  k,
fAlpha,
long  hA,
long  hB,
fBeta,
long  hC,
int  nAOffset = 0,
int  nBOffset = 0,
int  nCOffset = 0,
int  nGroups = 1,
int  nGroupOffsetA = 0,
int  nGroupOffsetB = 0,
int  nGroupOffsetC = 0 
)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
bTransBSpecifies whether or not to transpose B.
mSpecifies the width (number of columns) of A and C.
nSpecifies the height (number of rows) of B and C.
kSpecifies the width (number of columns) of A and B.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hBSpecifies a handle to the data for matrix B in GPU memory.
fBetaSpecifies a scalar multiplied by C where the scalar is of type 'T'.
hCSpecifies a handle to the data for matrix C in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nBOffsetSpecifies an offset (in items, not bytes) into the memory of B.
nCOffsetSpecifies an offset (in items, not bytes) into the memory of C.
nGroupsOptionally, specifies the number of groups (default = 1).
nGroupOffsetAOptionally, specifies an offset multiplied by the current group 'g' and added to the AOffset (default = 0).
nGroupOffsetBOptionally, specifies an offset multiplied by the current group 'g' and added to the BOffset (default = 0).
nGroupOffsetCOptionally, specifies an offset multiplied by the current group 'g' and added to the COffset (default = 0).

Definition at line 5738 of file CudaDnn.cs.

◆ gemv() [1/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
double  fAlpha,
long  hA,
long  hX,
double  fBeta,
long  hY 
)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
double
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector x in GPU memory.
fBetaSpecifies a scalar multiplied by y where the scalar is of type
double
hYSpecifies a handle to the data for vectory y in GPU memory.

Definition at line 5855 of file CudaDnn.cs.

◆ gemv() [2/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
float  fAlpha,
long  hA,
long  hX,
float  fBeta,
long  hY 
)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type
float
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector x in GPU memory.
fBetaSpecifies a scalar multiplied by y where the scalar is of type
float
hYSpecifies a handle to the data for vectory y in GPU memory.

Definition at line 5874 of file CudaDnn.cs.

◆ gemv() [3/3]

void MyCaffe.common.CudaDnn< T >.gemv ( bool  bTransA,
int  m,
int  n,
fAlpha,
long  hA,
long  hX,
fBeta,
long  hY,
int  nAOffset = 0,
int  nXOffset = 0,
int  nYOffset = 0 
)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
bTransASpecifies whether or not to transpose A.
mSpecifies the width (number of columns) of A.
nSpecifies the height (number of rows) of A.
fAlphaSpecifies a scalar multiplied by the data where the scalar is of type 'T'.
hASpecifies a handle to the data for matrix A in GPU memory.
hXSpecifies a handle to the data for vector X in GPU memory.
fBetaSpecifies a scalar multiplied by Y where the scalar is of type 'T'
hYSpecifies a handle to the data for vectory y in GPU memory.
nAOffsetSpecifies an offset (in items, not bytes) into the memory of A.
nXOffsetSpecifies an offset (in items, not bytes) into the memory of X.
nYOffsetSpecifies an offset (in items, not bytes) into the memory of Y.

Definition at line 5896 of file CudaDnn.cs.

◆ ger() [1/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
double  fAlpha,
long  hX,
long  hY,
long  hA 
)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 5916 of file CudaDnn.cs.

◆ ger() [2/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
float  fAlpha,
long  hX,
long  hY,
long  hA 
)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 5933 of file CudaDnn.cs.

◆ ger() [3/3]

void MyCaffe.common.CudaDnn< T >.ger ( int  m,
int  n,
fAlpha,
long  hX,
long  hY,
long  hA 
)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters
mSpecifies the length of X and rows in A (m x n).
nSpecifies the length of Y and cols in A (m x n).
fAlphaSpecifies a scalar multiplied by y where the scalar is of type 'T'.
hXSpecifies a handle to the data for matrix X (m in length) in GPU memory.
hYSpecifies a handle to the data for vector Y (n in length) in GPU memory.
hASpecifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 5950 of file CudaDnn.cs.

◆ get()

T[] MyCaffe.common.CudaDnn< T >.get ( int  nCount,
long  hHandle,
int  nIdx = -1 
)

Queries the GPU memory by copying it into an array of type 'T'.

Parameters
nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
Returns
An array of
T
is returned.

Definition at line 5438 of file CudaDnn.cs.

◆ get_double()

double[] MyCaffe.common.CudaDnn< T >.get_double ( int  nCount,
long  hHandle,
int  nIdx = -1 
)

Queries the GPU memory by copying it into an array of

double

Parameters
nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
Returns
An array of
double
is returned.

Definition at line 5414 of file CudaDnn.cs.

◆ get_float()

float[] MyCaffe.common.CudaDnn< T >.get_float ( int  nCount,
long  hHandle,
int  nIdx = -1 
)

Queries the GPU memory by copying it into an array of

float

Parameters
nCountSpecifies the number of items.
hHandleSpecifies a handle to GPU memory.
nIdxWhen -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.
Returns
An array of
float
is returned.

Definition at line 5426 of file CudaDnn.cs.

◆ GetConvolutionInfo()

void MyCaffe.common.CudaDnn< T >.GetConvolutionInfo ( long  hCuDnn,
long  hBottomDesc,
long  hFilterDesc,
long  hConvDesc,
long  hTopDesc,
ulong  lWorkspaceSizeLimitInBytes,
bool  bUseTensorCores,
out CONV_FWD_ALGO  algoFwd,
out ulong  lWsSizeFwd,
out CONV_BWD_FILTER_ALGO  algoBwdFilter,
out ulong  lWsSizeBwdFilter,
out CONV_BWD_DATA_ALGO  algoBwdData,
out ulong  lWsSizeBwdData,
CONV_FWD_ALGO  preferredFwdAlgo = CONV_FWD_ALGO.NONE 
)

Queryies the algorithms and workspace sizes used for a given convolution descriptor.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hFilterDescSpecifies a handle to the filter descriptor.
hConvDescSpecifies a handle to the convolution descriptor.
hTopDescSpecifies a handle to the top tensor descriptor.
lWorkspaceSizeLimitInBytesSpecifies the workspace limits (in bytes).
bUseTensorCoresSpecifies whether or not to use tensor cores (this parameter must match the setting of the 'bUseTensorCores' specified in the 'SetConvolutionDesc' method.
algoFwdReturns the algorithm used for the convolution foward.
lWsSizeFwdReturns the workspace size (in bytes) for the convolution foward.
algoBwdFilterReturns the algorithm used for the backward filter.
lWsSizeBwdFilterReturns the workspace size (int bytes) for the backward filter.
algoBwdDataReturns the algorithm for the backward data.
lWsSizeBwdDataReturns the workspace (in bytes) for the backward data.
preferredFwdAlgoOptionally, specifies a preferred forward algo to attempt to use for forward convolution. The new algo is only used if the current device supports it.

Definition at line 3554 of file CudaDnn.cs.

◆ GetCudaDnnDllPath()

static string MyCaffe.common.CudaDnn< T >.GetCudaDnnDllPath ( )
static

Returns the path to the CudaDnnDll module to use for low level CUDA processing.

Returns
The CudaDnnDll path is returned.

Definition at line 1443 of file CudaDnn.cs.

◆ GetDeviceCount()

int MyCaffe.common.CudaDnn< T >.GetDeviceCount ( )

Query the number of devices (gpu's) installed.

Returns
The number of GPU's is returned.

Definition at line 1905 of file CudaDnn.cs.

◆ GetDeviceID()

int MyCaffe.common.CudaDnn< T >.GetDeviceID ( )

Returns the current device id set within Cuda.

Returns
The device id.

Definition at line 1791 of file CudaDnn.cs.

◆ GetDeviceInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceInfo ( int  nDeviceID,
bool  bVerbose = false 
)

Query the device information of a device.

Parameters
nDeviceIDSpecifies the device id.
bVerboseWhen true, more detailed information is returned.
Returns

Definition at line 1842 of file CudaDnn.cs.

◆ GetDeviceMemory()

double MyCaffe.common.CudaDnn< T >.GetDeviceMemory ( out double  dfFree,
out double  dfUsed,
out bool  bCudaCallUsed,
int  nDeviceID = -1 
)

Queries the amount of total, free and used memory on a given GPU.

Parameters
dfFreeSpecifies the amount of free memory in GB.
dfUsedSpecifies the amount of used memory in GB.
bCudaCallUsedSpecifies whether or not the used memory is an estimate calculated using the Low-Level Cuda DNN Dll handle table.
nDeviceIDSpecifies the specific device id to query, or if -1, uses calculates an estimate of the memory used using the current low-level Cuda DNN Dll handle table.
Returns
The device's total amount of memory in GB is returned.

Definition at line 1960 of file CudaDnn.cs.

◆ GetDeviceName()

string MyCaffe.common.CudaDnn< T >.GetDeviceName ( int  nDeviceID)

Query the name of a device.

Parameters
nDeviceIDSpecifies the device id.
Returns
The name of the GPU at the device id is returned.

Definition at line 1813 of file CudaDnn.cs.

◆ GetDeviceP2PInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceP2PInfo ( int  nDeviceID)

Query the peer-to-peer information of a device.

Parameters
nDeviceIDSpecifies the device id.
Returns
The peer-to-per information of the GPU at the device id is returned.

Definition at line 1827 of file CudaDnn.cs.

◆ GetDropoutInfo()

void MyCaffe.common.CudaDnn< T >.GetDropoutInfo ( long  hCuDnn,
long  hBottomDesc,
out ulong  ulStateCount,
out ulong  ulReservedCount 
)

Query the dropout state and reserved counts.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
ulStateCountReturns the state count.
ulReservedCountReturns the reserved count.

Definition at line 3993 of file CudaDnn.cs.

◆ GetHostBufferCapacity()

long MyCaffe.common.CudaDnn< T >.GetHostBufferCapacity ( long  hMem)

Returns the host memory capacity.

Parameters
hMemSpecfies the host memory.
Returns
The current host memory capacity is returned.

Definition at line 2373 of file CudaDnn.cs.

◆ GetHostMemory()

T[] MyCaffe.common.CudaDnn< T >.GetHostMemory ( long  hMem)

Retrieves the host memory as an array of type 'T'

Parameters
hMemSpecifies the handle to the host memory.
Returns
An array of type 'T' is returned.

Definition at line 2414 of file CudaDnn.cs.

◆ GetHostMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetHostMemoryDouble ( long  hMem)

Retrieves the host memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
hMemSpecifies the handle to the host memory.
Returns
An array of doubles is returned.

Definition at line 2393 of file CudaDnn.cs.

◆ GetHostMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetHostMemoryFloat ( long  hMem)

Retrieves the host memory as an array of floats.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
hMemSpecifies the handle to the host memory.
Returns
An array of floats is returned.

Definition at line 2404 of file CudaDnn.cs.

◆ GetMemory()

T[] MyCaffe.common.CudaDnn< T >.GetMemory ( long  hMem,
long  lCount = -1 
)

Retrieves the GPU memory as an array of type 'T'

Parameters
hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
Returns
An array of type 'T' is returned.

Definition at line 2452 of file CudaDnn.cs.

◆ GetMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetMemoryDouble ( long  hMem,
long  lCount = -1 
)

Retrieves the GPU memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
Returns
An array of double is returned.

Definition at line 2429 of file CudaDnn.cs.

◆ GetMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetMemoryFloat ( long  hMem,
long  lCount = -1 
)

Retrieves the GPU memory as an array of float.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters
hMemSpecifies the handle to the GPU memory.
lCountOptionally, specifies a count of items to retrieve.
Returns
An array of float is returned.

Definition at line 2441 of file CudaDnn.cs.

◆ GetMultiGpuBoardGroupID()

int MyCaffe.common.CudaDnn< T >.GetMultiGpuBoardGroupID ( int  nDeviceID)

Query the mutli-gpu board group id for a device.

Parameters
nDeviceIDSpecifies the device id.
Returns
The mutli-gpu board group id is returned.

Definition at line 1887 of file CudaDnn.cs.

◆ GetRequiredCompute()

string MyCaffe.common.CudaDnn< T >.GetRequiredCompute ( out int  nMinMajor,
out int  nMinMinor 
)

The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used.

Parameters
nMinMajorSpecifies the minimum required major compute value.
nMinMinorSpecifies the minimum required minor compute value.

Together the Major.Minor compute values define the minimum required compute for the CudaDNN DLL used.

Returns
The path to the CudaDNN dll in use is returned.

Definition at line 1994 of file CudaDnn.cs.

◆ GetRnnLinLayerParams()

void MyCaffe.common.CudaDnn< T >.GetRnnLinLayerParams ( long  hCuDnn,
long  hRnnDesc,
int  nLayer,
long  hXDesc,
long  hWtDesc,
long  hWtData,
int  nLinLayer,
out int  nWtCount,
out long  hWt,
out int  nBiasCount,
out long  hBias 
)

Returns the linear layer parameters (weights).

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
nLayerSpecifies the current layer index.
hXDescSpecifies the input data elelement descriptor.
hWtDescSpecifies the weight descriptor.
hWtDataSpecifies the weight memory containing all weights.
nLinLayerSpecifies the linear layer index (e.g. LSTM has 8 linear layers, RNN has 2)
nWtCountReturns the number of weight items.
hWtReturns a handle to the weight GPU memory.
nBiasCountReturns the number of bias items.
hBiasReturns a handle to the bias GPU memory.

Definition at line 4574 of file CudaDnn.cs.

◆ GetRnnParamCount()

int MyCaffe.common.CudaDnn< T >.GetRnnParamCount ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc 
)

Returns the RNN parameter count.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies the handle to the first X descriptor.
Returns
The number of parameters (weights) is returned.

Definition at line 4522 of file CudaDnn.cs.

◆ GetRnnWorkspaceCount()

ulong MyCaffe.common.CudaDnn< T >.GetRnnWorkspaceCount ( long  hCuDnn,
long  hRnnDesc,
long  hXDesc,
out ulong  nReservedCount 
)

Returns the workspace and reserved counts.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hRnnDescSpecifies the handle to the RNN descriptor created with CreateRnnDesc
hXDescSpecifies a handle to the data descriptor created with CreateRnnDataDesc.
nReservedCountReturns the reserved count needed.
Returns
Returns the workspace count needed.

Definition at line 4544 of file CudaDnn.cs.

◆ hamming_distance()

double MyCaffe.common.CudaDnn< T >.hamming_distance ( int  n,
double  dfThreshold,
long  hA,
long  hB,
long  hY,
int  nOffA = 0,
int  nOffB = 0,
int  nOffY = 0 
)

The hamming_distance calculates the Hamming Distance between X and Y both of length n.

To calculate the hamming distance first, X and Y are bitified where each element is converted to 1 if > than the threshold, or 0 otherwise. Next, the bitified versions of X and Y are subtracted from one another, and the Asum of the result is returned, which is the number of bits that are different, thus the Hamming distance.

Parameters
nSpecifies the number of elements to compare in both X and Y.
dfThresholdSpecifies the threshold used to 'bitify' both X and Y
hASpecifies the handle to the GPU memory containing the first vector to compare.
hBSpecifies the handle to the GPU memory containing the second vector to compare.
hYSpecifies the handle to the GPU memory where the hamming difference (bitified A - bitified B) is placed.
nOffAOptionally, specifies an offset into the GPU memory of A, the default is 0.
nOffBOptionally, specifies an offset into the GPU memory of B, the default is 0.
nOffYOptionally, specifies an offset into the GPU memory of Y, the default is 0.
Returns
The hamming distance is returned.

Definition at line 9628 of file CudaDnn.cs.

◆ im2col()

void MyCaffe.common.CudaDnn< T >.im2col ( long  hDataIm,
int  nDataImOffset,
int  nChannels,
int  nHeight,
int  nWidth,
int  nKernelH,
int  nKernelW,
int  nPadH,
int  nPadW,
int  nStrideH,
int  nStrideW,
int  nDilationH,
int  nDilationW,
long  hDataCol,
int  nDataColOffset 
)

Rearranges image blocks into columns.

Parameters
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.
nChannelsSpecifies the number of channels in the image.
nHeightSpecifies the height of the image.
nWidthSpecifies the width of the image.
nKernelHSpecifies the kernel height.
nKernelWSpecifies the kernel width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nDilationHSpecifies the dilation along the height.
nDilationWSpecifies the dilation along the width.
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.

Definition at line 7158 of file CudaDnn.cs.

◆ im2col_nd()

void MyCaffe.common.CudaDnn< T >.im2col_nd ( long  hDataIm,
int  nDataImOffset,
int  nNumSpatialAxes,
int  nImCount,
int  nChannelAxis,
long  hImShape,
long  hColShape,
long  hKernelShape,
long  hPad,
long  hStride,
long  hDilation,
long  hDataCol,
int  nDataColOffset 
)

Rearranges image blocks into columns.

Parameters
hDataImSpecifies a handle to the image block in GPU memory.
nDataImOffsetSpecifies an offset into the image block memory.
nNumSpatialAxesSpecifies the number of spatial axes.
nImCountSpecifies the number of kernels.
nChannelAxisSpecifies the axis containing the channel.
hImShapeSpecifies a handle to the image shape data in GPU memory.
hColShapeSpecifies a handle to the column shape data in GPU memory.
hKernelShapeSpecifies a handle to the kernel shape data in GPU memory.
hPadSpecifies a handle to the pad data in GPU memory.
hStrideSpecifies a handle to the stride data in GPU memory.
hDilationSpecifies a handle to the dilation data in GPU memory.
hDataColSpecifies a handle to the column data in GPU memory.
nDataColOffsetSpecifies an offset into the column memory.

Definition at line 7182 of file CudaDnn.cs.

◆ interp2()

void MyCaffe.common.CudaDnn< T >.interp2 ( int  nChannels,
long  hData1,
int  nX1,
int  nY1,
int  nHeight1,
int  nWidth1,
int  nHeight1A,
int  nWidth1A,
long  hData2,
int  nX2,
int  nY2,
int  nHeight2,
int  nWidth2,
int  nHeight2A,
int  nWidth2A,
bool  bBwd = false 
)

Interpolates between two sizes within the spatial dimensions.

Parameters
nChannelsSpecifies the channels (usually num * channels)
hData1Specifies the input data when bBwd=false and the output data when bBwd=true.
nX1Specifies the offset along the x axis for data1.
nY1Specifies the offset along the y axis for data1.
nHeight1Specifies the effective height for data1.
nWidth1Specifies the effective width for data1.
nHeight1ASpecifies the input height for data1.
nWidth1ASpecifies the input width for data1.
hData2Specifies the output data when bBwd=false and the input data when bBwd=true.
nX2Specifies the offset along the x axis for data2.
nY2Specifies the offset along the y axis for data2.
nHeight2Specifies the effective height for data2.
nWidth2Specifies the effective width for data2.
nHeight2ASpecifies the output height for data2.
nWidth2ASpecifies the output width for data2.
bBwdOptionally, specifies to perform the backward operation from data2 to data1, otherwise the operation performs on data1 to data2. (default = false).

Definition at line 6417 of file CudaDnn.cs.

◆ KernelAdd()

void MyCaffe.common.CudaDnn< T >.KernelAdd ( int  nCount,
long  hA,
long  hDstKernel,
long  hB,
long  hC 
)

Add memory from one kernel to memory residing on another kernel.

Parameters
nCountSpecifies the number of items within both A and B.
hASpecifies the handle to the memory A.
hDstKernelSpecifies the kernel where the memory B and the desitnation memory C reside.
hBSpecifies the handle to the memory B (for which A will be added).
hCSpecifies the destination data where A+B will be placed.

Definition at line 1626 of file CudaDnn.cs.

◆ KernelCopy()

void MyCaffe.common.CudaDnn< T >.KernelCopy ( int  nCount,
long  hSrc,
int  nSrcOffset,
long  hDstKernel,
long  hDst,
int  nDstOffset,
long  hHostBuffer,
long  hHostKernel = -1,
long  hStream = -1,
long  hSrcKernel = -1 
)

Copy memory from the look-up tables in one kernel to another.

Parameters
nCountSpecifies the number of items to copy.
hSrcSpecifies the handle to the source memory.
nSrcOffsetSpecifies the offset (in items, not bytes) from which to start the copy in the source memory.
hDstKernelSpecifies the destination kernel holding the look-up table and memory where the data is to be copied.
hDstSpecifies the handle to the destination memory where the data is to be copied.
nDstOffsetSpecifies the offset (in items, not bytes) where the copy to to be placed within the destination data.
hHostBufferSpecifies the handle to the host buffer to be used when transfering the data from one kernel to another.
hHostKernelOptionally, specifies the handle to the kernel holding the look-up table for the host buffer.
hStreamOptionally, specifies the handle to the CUDA stream to use for the transfer.
hSrcKernelOptionally, specifies the handle to the source kernel.

Definition at line 1607 of file CudaDnn.cs.

◆ KernelCopyNccl()

long MyCaffe.common.CudaDnn< T >.KernelCopyNccl ( long  hSrcKernel,
long  hSrcNccl 
)

Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance.

Nccl handles are created on the main Kernel, but when used must transferred to the destination kernel (running on a different thread) where the secondary Nccl handle is used.

Parameters
hSrcKernelSpecifies the source kernel (typically where the Nccl handle was created).
hSrcNcclSpecifies the source Nccl handle to be copied.
Returns

Definition at line 1644 of file CudaDnn.cs.

◆ log() [1/2]

void MyCaffe.common.CudaDnn< T >.log ( int  n,
long  hA,
long  hY 
)

Calculates the log value of A and places the result in Y.

$ f(x) = log(x) $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6748 of file CudaDnn.cs.

◆ log() [2/2]

void MyCaffe.common.CudaDnn< T >.log ( int  n,
long  hA,
long  hY,
double  dfBeta,
double  dfAlpha = 0 
)

Calculates the log value of (A * beta) + alpha, and places the result in Y.

$ f(x) = \ln((x * \beta) + \alpha) $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
dfBetaSpecifies the scalar as type
double
that is multiplied with the log.
dfAlphaOptionally, specifies a scalar added to the value before taking the log.

Definition at line 6764 of file CudaDnn.cs.

◆ lrn_computediff()

void MyCaffe.common.CudaDnn< T >.lrn_computediff ( int  nCount,
long  hBottomData,
long  hTopData,
long  hScaleData,
long  hTopDiff,
int  nNum,
int  nChannels,
int  nHeight,
int  nWidth,
int  nSize,
fNegativeBeta,
fCacheRatio,
long  hBottomDiff 
)

Computes the diff used to calculate the LRN cross channel backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hScaleDataSpecifies a handle to the scale data in GPU memory.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nNumSpecifies the number of input items.
nChannelsSpecifies the number of channels per input item.
nHeightSpecifies the height of each input item.
nWidthSpecifies the width of each input item.
nSizeNEEDS REVIEW
fNegativeBetaSpecifies the negative beta value.
fCacheRatioNEEDS REVIEW
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8887 of file CudaDnn.cs.

◆ lrn_computeoutput()

void MyCaffe.common.CudaDnn< T >.lrn_computeoutput ( int  nCount,
long  hBottomData,
long  hScaleData,
fNegativeBeta,
long  hTopData 
)

Computes the output used to calculate the LRN cross channel forward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
hScaleDataSpecifies a handle to the scale data in GPU memory.
fNegativeBetaSpecifies the negative beta value.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 8862 of file CudaDnn.cs.

◆ lrn_fillscale()

void MyCaffe.common.CudaDnn< T >.lrn_fillscale ( int  nCount,
long  hBottomData,
int  nNum,
int  nChannels,
int  nHeight,
int  nWidth,
int  nSize,
fAlphaOverSize,
fK,
long  hScaleData 
)

Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hBottomDataSpecifies a handle to the Bottom data in GPU memory.
nNumSpecifies the number of input items.
nChannelsSpecifies the number of channels per input item.
nHeightSpecifies the height of each input item.
nWidthSpecifies the width of each input item.
nSizeNEEDS REVIEW
fAlphaOverSizeSpecifies the alpha value over the size.
fKSpecifies the k value.
hScaleDataSpecifies a handle to the scale data in GPU memory.

Definition at line 8846 of file CudaDnn.cs.

◆ LRNCrossChannelBackward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelBackward ( long  hCuDnn,
long  hNormDesc,
fAlpha,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hBottomDiffDesc,
long  hBottomDiff 
)

Perform LRN cross channel backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4125 of file CudaDnn.cs.

◆ LRNCrossChannelForward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelForward ( long  hCuDnn,
long  hNormDesc,
fAlpha,
long  hBottomDesc,
long  hBottomData,
fBeta,
long  hTopDesc,
long  hTopData 
)

Perform LRN cross channel forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hNormDescSpecifies a handle to an LRN descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4102 of file CudaDnn.cs.

◆ lstm_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_bwd ( int  t,
int  nN,
int  nH,
int  nI,
double  dfClippingThreshold,
long  hWeight_h,
long  hClipData,
int  nClipOffset,
long  hTopDiff,
int  nTopOffset,
long  hCellData,
long  hCellDiff,
int  nCellOffset,
long  hPreGateDiff,
int  nPreGateOffset,
long  hGateData,
long  hGateDiff,
int  nGateOffset,
long  hCT1Data,
int  nCT1Offset,
long  hDHT1Diff,
int  nDHT1Offset,
long  hDCT1Diff,
int  nDCT1Offset,
long  hHtoHData,
long  hContextDiff = 0,
long  hWeight_c = 0 
)

Peforms the simple LSTM backward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters
tSpecifies the step within the sequence.
nNSpecifies the batch size.
nHSpecifies the number of hidden units.
nISpecifies the number the input size.
dfClippingThreshold
hWeight_hSpecifies a handle to the GPU memory holding the 'h' weights.
hClipDataSpecifies a handle to the GPU memory holding the clip data.
nClipOffsetSpecifies the clip offset for this step within the sequence.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nTopOffsetSpecifies an offset into the top diff memory.
hCellDataSpecifies a handle to the GPU memory holding the 'c_t' data.
hCellDiffSpecifies a handle to the GPU memory holding the 'c_t' gradients.
nCellOffsetSpecifies the c_t offset for this step within the sequence.
hPreGateDiffSpecifies a handle to the GPU memory holding the pre-gate gradients.
nPreGateOffsetSpecifies the pre-gate offset for this step within the sequence.
hGateDataSpecifies a handle to the GPU memory holding the gate data.
hGateDiffSpecifies a handle to the GPU memory holding the gate gradients.
nGateOffsetSpecifies the gate data offset for this step within the sequence.
hCT1DataSpecifies a handle to the GPU memory holding the CT1 data.
nCT1OffsetSpecifies the CT1 offset for this step within the sequence.
hDHT1DiffSpecifies a handle to the GPU DHT1 gradients.
nDHT1OffsetSpecifies the DHT1 offset for this step within the sequence.
hDCT1DiffSpecifies a handle to the DCT1 gradients.
nDCT1OffsetSpecifies the DCT1 offset for this step within the sequence.
hHtoHDataSpecifies a handle to the GPU memory holding the H to H data.
hContextDiffOptionally, specifies the handle to the GPU memory holding the context diff, or 0 when not used.
hWeight_cOptionally, specifies the handle to the GPU memory holding the 'c' weights, or 0 when not used.

Definition at line 9089 of file CudaDnn.cs.

◆ lstm_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_fwd ( int  t,
int  nN,
int  nH,
int  nI,
long  hWeight_h,
long  hWeight_i,
long  hClipData,
int  nClipOffset,
long  hTopData,
int  nTopOffset,
long  hCellData,
int  nCellOffset,
long  hPreGateData,
int  nPreGateOffset,
long  hGateData,
int  nGateOffset,
long  hHT1Data,
int  nHT1Offset,
long  hCT1Data,
int  nCT1Offset,
long  hHtoGateData,
long  hContext = 0,
long  hWeight_c = 0,
long  hCtoGetData = 0 
)

Peforms the simple LSTM foward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters
tSpecifies the step within the sequence.
nNSpecifies the batch size.
nHSpecifies the number of hidden units.
nISpecifies the number the input size.
hWeight_hSpecifies a handle to the GPU memory holding the 'h' weights.
hWeight_iSpecifies a handle to the GPU memory holding the 'i' weights.
hClipDataSpecifies a handle to the GPU memory holding the clip data.
nClipOffsetSpecifies the clip offset for this step within the sequence.
hTopDataSpecifies a handle to the top data in GPU memory.
nTopOffsetSpecifies an offset into the top data memory.
hCellDataSpecifies a handle to the GPU memory holding the 'c_t' data.
nCellOffsetSpecifies the c_t offset for this step within the sequence.
hPreGateDataSpecifies a handle to the GPU memory holding the pre-gate data.
nPreGateOffsetSpecifies the pre-gate offset for this step within the sequence.
hGateDataSpecifies a handle to the GPU memory holding the gate data.
nGateOffsetSpecifies the gate data offset for this step within the sequence.
hHT1DataSpecifies a handle to the GPU memory holding the HT1 data.
nHT1OffsetSpecifies the HT1 offset for this step within the sequence.
hCT1DataSpecifies a handle to the GPU memory holding the CT1 data.
nCT1OffsetSpecifies the CT1 offset for this step within the sequence.
hHtoGateDataSpecifies a handle to the GPU memory holding the H to Gate data.
hContextOptionally, specifies the attention context, or 0 when not used.
hWeight_cOptionally, specifies the attention context weights, or 0 when not used.
hCtoGetDataOptionally, specifies the attention context to gate data, or 0 when not used.

Definition at line 9048 of file CudaDnn.cs.

◆ lstm_unit_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_bwd ( int  nCount,
int  nHiddenDim,
int  nXCount,
long  hC_prev,
long  hX_acts,
long  hC,
long  hH,
long  hCont,
long  hC_diff,
long  hH_diff,
long  hC_prev_diff,
long  hX_acts_diff,
long  hX_diff 
)

Peforms the simple LSTM backward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters
nCountNEEDS REVIEW
nHiddenDimNEEDS REVIEW
nXCountNEEDS REVIEW
hC_prevNEEDS REVIEW
hX_actsNEEDS REVIEW
hCNEEDS REVIEW
hHNEEDS REVIEW
hContNEEDS REVIEW
hC_diffNEEDS REVIEW
hH_diffNEEDS REVIEW
hC_prev_diffNEEDS REVIEW
hX_acts_diffNEEDS REVIEW
hX_diffNEEDS REVIEW

Definition at line 9139 of file CudaDnn.cs.

◆ lstm_unit_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_fwd ( int  nCount,
int  nHiddenDim,
int  nXCount,
long  hX,
long  hX_acts,
long  hC_prev,
long  hCont,
long  hC,
long  hH 
)

Peforms the simple LSTM foward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters
nCountNEEDS REVIEW
nHiddenDimNEEDS REVIEW
nXCountNEEDS REVIEW
hXNEEDS REVIEW
hX_actsNEEDS REVIEW
hC_prevNEEDS REVIEW
hContNEEDS REVIEW
hCNEEDS REVIEW
hHNEEDS REVIEW

Definition at line 9112 of file CudaDnn.cs.

◆ math_bwd()

void MyCaffe.common.CudaDnn< T >.math_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
MATH_FUNCTION  function 
)

Performs a Math function backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
functionSpecifies the mathematical function to use.

Definition at line 7927 of file CudaDnn.cs.

◆ math_fwd()

void MyCaffe.common.CudaDnn< T >.math_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
MATH_FUNCTION  function 
)

Performs a Math function forward pass in Cuda.

Calculation $ Y[i] = function(X[i]) $

Parameters
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
functionSpecifies the mathematical function to use.

Definition at line 7910 of file CudaDnn.cs.

◆ matrix_meancenter_by_column()

void MyCaffe.common.CudaDnn< T >.matrix_meancenter_by_column ( int  nWidth,
int  nHeight,
long  hA,
long  hB,
long  hY,
bool  bNormalize = false 
)

Mean center the data by columns, where each column is summed and then subtracted from each column value.

Parameters
nWidthNumber of columns in the matrix (dimension D)
nHeightNumber of rows in the matrix (dimension N)
hAInput data matrix - N x D matrix (N rows, D columns)
hBColumn sums vector - D x 1 vector containing the sum of each column.
hYOutput data matrix - N x D matrix (N rows, D columns) containing mean centering of the input data matrix.
bNormalizeWhen true, each data item is divided by N to normalize each row item by column.

Definition at line 9348 of file CudaDnn.cs.

◆ max()

double MyCaffe.common.CudaDnn< T >.max ( int  n,
long  hA,
out long  lPos,
int  nAOff = 0 
)

Finds the maximum value of A.

This function uses NVIDIA's Thrust.

Parameters
nSpecifies the number of items (not bytes) in the vectors A.
hASpecifies a handle to the vector A in GPU memory.
lPosReturns the position of the maximum value.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
Returns
The maximum value is returned as type
double

Definition at line 6932 of file CudaDnn.cs.

◆ max_bwd()

void MyCaffe.common.CudaDnn< T >.max_bwd ( int  nCount,
long  hTopDiff,
int  nIdx,
long  hMask,
long  hBottomDiff 
)

Performs a max backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nIdxSpecifies the blob index used to test the mask.
hMaskSpecifies a handle to the mask data in GPU.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8461 of file CudaDnn.cs.

◆ max_fwd()

void MyCaffe.common.CudaDnn< T >.max_fwd ( int  nCount,
long  hBottomDataA,
long  hBottomDataB,
int  nIdx,
long  hTopData,
long  hMask 
)

Performs a max forward pass in Cuda.

Calculation: $ Y[i] = max(A[i], B[i]) $

Parameters
nCountSpecifies the number of items.
hBottomDataASpecifies a handle to the Bottom A data in GPU memory.
hBottomDataBSpecifies a handle to the Bottom B data in GPU memory.
nIdxSpecifies the blob index used to set the mask.
hTopDataSpecifies a handle to the Top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU.

Definition at line 8445 of file CudaDnn.cs.

◆ mean_error_loss_bwd()

void MyCaffe.common.CudaDnn< T >.mean_error_loss_bwd ( int  nCount,
long  hPredicted,
long  hTarget,
long  hBottomDiff,
MEAN_ERROR  merr 
)

Performs a Mean Error Loss backward pass in Cuda.

The gradient is set to: +1 when predicted greater than target, -1 when predicted less than target, 0 when predicted equal to target. if propagate_down[1] == true.

See also
Mean Absolute Error (MAE) derivative
Parameters
nCountSpecifies the number of items.
hPredictedSpecifies a handle to the predicted data in GPU memory.
hTargetSpecifies a handle to the target data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
merrSpecifies the type of mean error to run.

Definition at line 7952 of file CudaDnn.cs.

◆ min()

double MyCaffe.common.CudaDnn< T >.min ( int  n,
long  hA,
out long  lPos,
int  nAOff = 0 
)

Finds the minimum value of A.

This function uses NVIDIA's Thrust.

Parameters
nSpecifies the number of items (not bytes) in the vectors A.
hASpecifies a handle to the vector A in GPU memory.
lPosReturns the position of the minimum value.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
Returns
The minimum value is returned as type
double

Definition at line 6959 of file CudaDnn.cs.

◆ min_bwd()

void MyCaffe.common.CudaDnn< T >.min_bwd ( int  nCount,
long  hTopDiff,
int  nIdx,
long  hMask,
long  hBottomDiff 
)

Performs a min backward pass in Cuda.

Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
nIdxSpecifies the blob index used to test the mask.
hMaskSpecifies a handle to the mask data in GPU.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 8497 of file CudaDnn.cs.

◆ min_fwd()

void MyCaffe.common.CudaDnn< T >.min_fwd ( int  nCount,
long  hBottomDataA,
long  hBottomDataB,
int  nIdx,
long  hTopData,
long  hMask 
)

Performs a min forward pass in Cuda.

Calculation: $ Y[i] = min(A[i], B[i]) $

Parameters
nCountSpecifies the number of items.
hBottomDataASpecifies a handle to the Bottom A data in GPU memory.
hBottomDataBSpecifies a handle to the Bottom B data in GPU memory.
nIdxSpecifies the blob index used to set the mask.
hTopDataSpecifies a handle to the Top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU.

Definition at line 8481 of file CudaDnn.cs.

◆ minmax() [1/2]

Tuple< double, double, double, double > MyCaffe.common.CudaDnn< T >.minmax ( int  n,
long  hA,
long  hWork1,
long  hWork2,
bool  bDetectNans = false,
int  nAOff = 0 
)

Finds the minimum and maximum values within A.

Parameters
nSpecifies the number of items (not bytes) in the vector A.
hASpecifies a handle to the vector A in GPU memory.
hWork1Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
bDetectNansOptionally, specifies whether or not to detect Nans.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
Returns
A four element tuple is returned where the first item contains the minimum, the second item contains the maximum, the third contains the number of NaN values and the fourth contains the number of Infinity values.
When calling this function with
hA = 0
the function instead returns the required size of hWork1, hWork2, 0, 0 (in items, not bytes).

Definition at line 6987 of file CudaDnn.cs.

◆ minmax() [2/2]

void MyCaffe.common.CudaDnn< T >.minmax ( int  n,
long  hA,
long  hWork1,
long  hWork2,
int  nK,
long  hMin,
long  hMax,
bool  bNonZeroOnly 
)

Finds up to 'nK' minimum and maximum values within A.

Parameters
nSpecifies the number of items (not bytes) in the vector A.
hASpecifies a handle to the vector A in GPU memory.
hWork1Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
nKSpecifies the number of min and max values to find.
hMinSpecifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
hMaxSpecifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
bNonZeroOnlySpecifies whether or not to exclude zero from the min and max calculations.

Definition at line 7012 of file CudaDnn.cs.

◆ mish_bwd()

void MyCaffe.common.CudaDnn< T >.mish_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
long  hBottomData,
double  dfThreshold,
int  nMethod = 0 
)

Performs a Mish backward pass in Cuda.

Computes the mish gradient $ f(x)' = \frac{ exp(x) * (4*e^x * x + 4*x + 6*e^x + 4*e^2x + e^3x + 4) }{ (2*e^x + e^2x + 2)^2 } $ Note, see Wolfram Alpha with 'derivative of x * tanh(ln(1 + e^x))'

See also
Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.
Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hBottomDataSpecifies a handle tot he bottom data in GPU memory.
dfThresholdSpecifies the threshold value.
nMethodOptionally, specifies to run the new implementation when > 0.

Definition at line 7996 of file CudaDnn.cs.

◆ mish_fwd()

void MyCaffe.common.CudaDnn< T >.mish_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
double  dfThreshold 
)

Performs a Mish forward pass in Cuda.

Computes the mish non-linearity $ f(x) = x * tanh(ln( 1 + e^x )) $.

See also
Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.
Parameters
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
dfThresholdSpecifies the threshold value.

Definition at line 7972 of file CudaDnn.cs.

◆ mul()

void MyCaffe.common.CudaDnn< T >.mul ( int  n,
long  hA,
long  hB,
long  hY,
int  nAOff = 0,
int  nBOff = 0,
int  nYOff = 0 
)

Multiplies each element of A with each element of B and places the result in Y.

Y = A * B (element by element)

Parameters
nSpecifies the number of items (not bytes) in the vectors A, B and Y.
hASpecifies a handle to the vector A in GPU memory.
hBSpecifies a handle to the vector B in GPU memory.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies an offset (in items, not bytes) into the memory of A.
nBOffOptionally, specifies an offset (in items, not bytes) into the memory of B.
nYOffOptionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6594 of file CudaDnn.cs.

◆ mul_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
double  fAlpha,
long  hY 
)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters
nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type
double
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6634 of file CudaDnn.cs.

◆ mul_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
float  fAlpha,
long  hY 
)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters
nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type
float
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6648 of file CudaDnn.cs.

◆ mul_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar ( int  n,
fAlpha,
long  hY 
)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters
nSpecifies the number of items (not bytes) in the vectors Y.
fAlphaSpecifies the scalar in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 6662 of file CudaDnn.cs.

◆ mulbsx()

void MyCaffe.common.CudaDnn< T >.mulbsx ( int  n,
long  hA,
int  nAOff,
long  hX,
int  nXOff,
int  nC,
int  nSpatialDim,
bool  bTranspose,
long  hB,
int  nBOff 
)

Multiply a matrix with a vector.

Parameters
nSpecifies the number of items.
hASpecifies the matrix to multiply.
nAOffSpecifies the offset to apply to the GPU memory of hA.
hXSpecifies the vector to multiply.
nXOffSpecifies the offset to apply to the GPU memory of hX.
nCSpecifies the number of channels.
nSpatialDimSpecifies the spatial dimension.
bTransposeSpecifies whether or not to transpose the matrix.
hBSpecifies the output matrix.
nBOffSpecifies the offset to apply to the GPU memory of hB.

Definition at line 6074 of file CudaDnn.cs.

◆ NcclAllReduce()

void MyCaffe.common.CudaDnn< T >.NcclAllReduce ( long  hNccl,
long  hStream,
long  hX,
int  nCount,
NCCL_REDUCTION_OP  op,
double  dfScale = 1.0 
)

Performs a reduction on all NCCL instances as specified by the reduction operation.

See Fast Multi-GPU collectives with NCCL.

Parameters
hNcclSpecifies a handle to an NCCL instance.
hStreamSpecifies a handle to the stream to use for synchronization.
hXSpecifies a handle to the GPU data to reduce with the other instances of NCCL.
nCountSpecifies the number of items (not bytes) in the data.
opSpecifies the reduction operation to perform.
dfScaleOptionally, specifies a scaling to be applied to the final reduction.

Definition at line 3186 of file CudaDnn.cs.

◆ NcclBroadcast()

void MyCaffe.common.CudaDnn< T >.NcclBroadcast ( long  hNccl,
long  hStream,
long  hX,
int  nCount 
)

Broadcasts a block of GPU data to all NCCL instances.

See Fast Multi-GPU collectives with NCCL.

Parameters
hNcclSpecifies a handle to an NCCL instance.
hStreamSpecifies a handle to the stream to use for synchronization.
hXSpecifies a handle to the GPU data to be broadcasted (or recieved).
nCountSpecifies the number of items (not bytes) in the data.

Definition at line 3165 of file CudaDnn.cs.

◆ NcclInitializeMultiProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeMultiProcess ( long  hNccl)

Initializes a set of NCCL instances for use in different processes.

See Fast Multi-GPU collectives with NCCL.

Parameters
hNcclSpecifies the handle of NCCL to initialize.

Definition at line 3147 of file CudaDnn.cs.

◆ NcclInitializeSingleProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeSingleProcess ( params long[]  rghNccl)

Initializes a set of NCCL instances for use in a single process.

See Fast Multi-GPU collectives with NCCL.

Parameters
rghNcclSpecifies the array of NCCL handles that will be working together.

Definition at line 3114 of file CudaDnn.cs.

◆ nesterov_update()

void MyCaffe.common.CudaDnn< T >.nesterov_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,
fMomentum,
fLocalRate 
)

Perform the Nesterov update

See Lecture 6c The momentum method by Hinton, et al., 2012, and Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent by Botev, et al., 2016

Parameters
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fMomentumSpecifies the momentum value.
fLocalRateSpecifies the local learning rate.

Definition at line 8926 of file CudaDnn.cs.

◆ permute()

void MyCaffe.common.CudaDnn< T >.permute ( int  nCount,
long  hBottom,
bool  bFwd,
long  hPermuteOrder,
long  hOldSteps,
long  hNewSteps,
int  nNumAxes,
long  hTop 
)

Performs data permutation on the input and reorders the data which is placed in the output.

Parameters
nCountSpecifies the number of items.
hBottomSpecifies the input data.
bFwdSpecifies whether or not this is a forward (true) or backwards (true) operation.
hPermuteOrderSpecifies the permuation order values in GPU memory.
hOldStepsSpecifies the old step values in GPU memory.
hNewStepsSpecifies the new step values in GPU memory.
nNumAxesSpecifies the number of axes.
hTopSpecifies the output data.

Definition at line 8785 of file CudaDnn.cs.

◆ pooling_bwd()

void MyCaffe.common.CudaDnn< T >.pooling_bwd ( POOLING_METHOD  method,
int  nCount,
long  hTopDiff,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hBottomDiff,
long  hMask,
long  hTopMask 
)

Performs the backward pass for pooling using Cuda

Parameters
methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hTopDiffSpecifies a handle to the top diff in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
hTopMaskSpecifies a handle to the top mask data in GPU memory.

Definition at line 7800 of file CudaDnn.cs.

◆ pooling_fwd()

void MyCaffe.common.CudaDnn< T >.pooling_fwd ( POOLING_METHOD  method,
int  nCount,
long  hBottomData,
int  num,
int  nChannels,
int  nHeight,
int  nWidth,
int  nPooledHeight,
int  nPooledWidth,
int  nKernelH,
int  nKernelW,
int  nStrideH,
int  nStrideW,
int  nPadH,
int  nPadW,
long  hTopData,
long  hMask,
long  hTopMask 
)

Performs the forward pass for pooling using Cuda

Parameters
methodSpecifies the pooling method.
nCountSpecifies the number of items in the bottom data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
numSpecifies the number of inputs.
nChannelsSpecifies the number of channels per input.
nHeightSpecifies the height of each input.
nWidthSpecifies the width of each input.
nPooledHeightSpecifies the height of the pooled data.
nPooledWidthSpecifies the width of the pooled data.
nKernelHSpecifies the height of the pooling kernel.
nKernelWSpecifies the width of the pooling kernel.
nStrideHSpecifies the stride along the height.
nStrideWSpecifies the stride along the width.
nPadHSpecifies the pad applied to the height.
nPadWSpecifies the pad applied to the width.
hTopDataSpecifies a handle to the top data in GPU memory.
hMaskSpecifies a handle to the mask data in GPU memory.
hTopMaskSpecifies a handle to the top mask data in GPU memory.

Definition at line 7771 of file CudaDnn.cs.

◆ PoolingBackward()

void MyCaffe.common.CudaDnn< T >.PoolingBackward ( long  hCuDnn,
long  hPoolingDesc,
fAlpha,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hBottomDiffDesc,
long  hBottomDiff 
)

Perform a pooling backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hPoolingDescSpecifies a handle to the pooling descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 3857 of file CudaDnn.cs.

◆ PoolingForward()

void MyCaffe.common.CudaDnn< T >.PoolingForward ( long  hCuDnn,
long  hPoolingDesc,
fAlpha,
long  hBottomDesc,
long  hBottomData,
fBeta,
long  hTopDesc,
long  hTopData 
)

Perform a pooling forward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
hPoolingDescSpecifies a handle to the pooling descriptor.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDescSpecifies a handle to the bottom tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDescSpecifies a handle to the top tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 3834 of file CudaDnn.cs.

◆ powx() [1/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
double  fAlpha,
long  hY,
int  nAOff = 0,
int  nYOff = 0 
)

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type
double
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 6784 of file CudaDnn.cs.

◆ powx() [2/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
float  fAlpha,
long  hY,
int  nAOff = 0,
int  nYOff = 0 
)

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type
float
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 6801 of file CudaDnn.cs.

◆ powx() [3/3]

void MyCaffe.common.CudaDnn< T >.powx ( int  n,
long  hA,
fAlpha,
long  hY,
int  nAOff = 0,
int  nYOff = 0 
)

Calculates the A raised to the power alpha and places the result in Y.

$ f(x) = x^\alpha $

Parameters
nSpecifies the number of items (not bytes) in the vectors A and Y.
hASpecifies a handle to the vector A in GPU memory.
fAlphaSpecifies the scalar in type 'T'.
hYSpecifies a handle to the vector Y in GPU memory.
nAOffOptionally, specifies the offset for hA memory (default = 0).
nYOffOptionally, specifies the offset for hY memory (default = 0).

Definition at line 6818 of file CudaDnn.cs.

◆ prelu_bwd()

void MyCaffe.common.CudaDnn< T >.prelu_bwd ( int  nCount,
int  nChannels,
int  nDim,
long  hTopDiff,
long  hBottomData,
long  hBottomDiff,
long  hSlopeData,
int  nDivFactor 
)

Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
Parameters
nCountSpecifies the number of items.
nChannelsSpecifies the channels per input.
nDimSpecifies the dimension of each input.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
hSlopeDataSpecifies a handle to the slope data in GPU memory.
nDivFactorSpecifies the div factor applied to the channels.

Definition at line 8356 of file CudaDnn.cs.

◆ prelu_bwd_param()

void MyCaffe.common.CudaDnn< T >.prelu_bwd_param ( int  nCDim,
int  nNum,
int  nTopOffset,
long  hTopDiff,
long  hBottomData,
long  hBackBuffDiff 
)

Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda.

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
Parameters
nCDimNEEDS REVIEW
nNumNEEDS REVIEW
nTopOffsetNEEDS REVIEW
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hBackBuffDiffSpecifies a handle to the back buffer diff in GPU memory.

Definition at line 8333 of file CudaDnn.cs.

◆ prelu_fwd()

void MyCaffe.common.CudaDnn< T >.prelu_fwd ( int  nCount,
int  nChannels,
int  nDim,
long  hBottomData,
long  hTopData,
long  hSlopeData,
int  nDivFactor 
)

Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation $ f(x) = (x > 0) ? x : x * slopeData $

See also
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
Parameters
nCountSpecifies the number of items.
nChannelsSpecifies the channels per input.
nDimSpecifies the dimension of each input.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hSlopeDataSpecifies a handle to the slope data in GPU memory.
nDivFactorSpecifies the div factor applied to the channels.

Definition at line 8311 of file CudaDnn.cs.

◆ relu_bwd()

void MyCaffe.common.CudaDnn< T >.relu_bwd ( int  nCount,
long  hTopDiff,
long  hTopData,
long  hBottomDiff,
fNegativeSlope 
)

Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also
Rectifier, and
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
Parameters
nCountSpecifies the number of items.
hTopDiffSpecifies a handle to the top diff in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.
fNegativeSlopeSpecifies the negative slope.

Definition at line 8175 of file CudaDnn.cs.

◆ relu_fwd()

void MyCaffe.common.CudaDnn< T >.relu_fwd ( int  nCount,
long  hBottomData,
long  hTopData,
fNegativeSlope 
)

Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation $ f(x) = (x > 0) ? x : x * negativeSlope $

See also
Rectifier, and
Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015
Parameters
nCountSpecifies the number of items in the bottom and top data.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
hTopDataSpecifies a handle to the top data in GPU memory.
fNegativeSlopeSpecifies the negative slope.

Definition at line 8154 of file CudaDnn.cs.

◆ ReLUBackward()

void MyCaffe.common.CudaDnn< T >.ReLUBackward ( long  hCuDnn,
fAlpha,
long  hTopDataDesc,
long  hTopData,
long  hTopDiffDesc,
long  hTopDiff,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hBottomDiffDesc,
long  hBottomDiff 
)

Perform a ReLU backward pass.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.
hTopDiffDescSpecifies a handle to the top diff tensor descriptor
hTopDiffSpecifies a handle to the top diff in GPU memory.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hBottomDiffDescSpecifies a handle to the bottom diff tensor descriptor.
hBottomDiffSpecifies a handle to the bottom diff in GPU memory.

Definition at line 4339 of file CudaDnn.cs.

◆ ReLUForward()

void MyCaffe.common.CudaDnn< T >.ReLUForward ( long  hCuDnn,
fAlpha,
long  hBottomDataDesc,
long  hBottomData,
fBeta,
long  hTopDataDesc,
long  hTopData 
)

Perform a ReLU forward pass.

See Rectifier Nonlinearities Improve Neural Network Acoustic Models by Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013), In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.

Parameters
hCuDnnSpecifies a handle to the instance of cuDnn.
fAlphaSpecifies a scaling factor applied to the result.
hBottomDataDescSpecifies a handle to the bottom data tensor descriptor.
hBottomDataSpecifies a handle to the bottom data in GPU memory.
fBetaSpecifies a scaling factor applied to the prior destination value.
hTopDataDescSpecifies a handle to the top data tensor descriptor.
hTopDataSpecifies a handle to the top data in GPU memory.

Definition at line 4317 of file CudaDnn.cs.

◆ ResetDevice()

void MyCaffe.common.CudaDnn< T >.ResetDevice ( )

Reset the current device.

IMPORTANT: This function will delete all memory and state information on the current device, which may cause other CudaDnn instances using the same device, to fail. For that reason, it is recommended to only call this function when testing.

Definition at line 1857 of file CudaDnn.cs.

◆ ResetGhostMemory()

void MyCaffe.common.CudaDnn< T >.ResetGhostMemory ( )

Resets the ghost memory by enabling it if this instance was configured to use ghost memory.

Definition at line 1561 of file CudaDnn.cs.

◆ rmsprop_update()

void MyCaffe.common.CudaDnn< T >.rmsprop_update ( int  nCount,
long  hNetParamsDiff,
long  hHistoryData,
fRmsDecay,
fDelta,
fLocalRate 
)

Perform the RMSProp update

See Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude by Tieleman and Hinton, 2012, and RMSProp and equilibrated adaptive learning rates for non-convex optimization by Dauphin, et al., 2015

Parameters
nCountSpecifies the number of items.
hNetParamsDiffSpecifies a handle to the net params diff in GPU memory.
hHistoryDataSpecifies a handle to the history data in GPU memory.
fRmsDecaySpecifies the decay value used by the Solver. MeanSquare(t) = 'rms_decay' * MeanSquare(t-1) + (1 - 'rms_decay') * SquareGradient(t).
fDeltaSpecifies the numerical stability factor.
fLocalRateSpecifies the local learning rate.

Definition at line 9010 of file CudaDnn.cs.

◆ rng_bernoulli() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
double  fNonZeroProb,
long  hY 
)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7590 of file CudaDnn.cs.

◆ rng_bernoulli() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
float  fNonZeroProb,
long  hY 
)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7604 of file CudaDnn.cs.

◆ rng_bernoulli() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli ( int  n,
fNonZeroProb,
long  hY 
)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fNonZeroProbSpecifies the probability that a given value is set to non zero.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7618 of file CudaDnn.cs.

◆ rng_gaussian() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
double  fMu,
double  fSigma,
long  hY 
)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of
double
fSigmaSpecifies the standard deviation of the distribution with a type of
double
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7537 of file CudaDnn.cs.

◆ rng_gaussian() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
float  fMu,
float  fSigma,
long  hY 
)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of
float
fSigmaSpecifies the standard deviation of the distribution with a type of
float
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7552 of file CudaDnn.cs.

◆ rng_gaussian() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian ( int  n,
fMu,
fSigma,
long  hY 
)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMuSpecifies the mean of the distribution with a type of 'T'.
fSigmaSpecifies the standard deviation of the distribution with a type of 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7567 of file CudaDnn.cs.

◆ rng_setseed()

void MyCaffe.common.CudaDnn< T >.rng_setseed ( long  lSeed)

Sets the random number generator seed used by random number operations.

This function uses NVIDIA's cuRand

Parameters
lSeedSpecifies the random number generator seed.

Definition at line 7465 of file CudaDnn.cs.

◆ rng_uniform() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
double  fMin,
double  fMax,
long  hY 
)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of
double
fMaxSpecifies the maximum value of the distribution with a type of
double
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7483 of file CudaDnn.cs.

◆ rng_uniform() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
float  fMin,
float  fMax,
long  hY 
)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of
float
fMaxSpecifies the maximum value of the distribution with a type of
float
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7498 of file CudaDnn.cs.

◆ rng_uniform() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform ( int  n,
fMin,
fMax,
long  hY 
)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters
nSpecifies the number of items (not bytes) in the vector X.
fMinSpecifies the minimum value of the distribution with a type of 'T'.
fMaxSpecifies the maximum value of the distribution with a type of 'T'.
hYSpecifies a handle to the vector Y in GPU memory.

Definition at line 7513 of file CudaDnn.cs.

◆ RnnBackwardData()

void MyCaffe.common.CudaDnn< T >.RnnBackwardData ( long  hCuDnn,
long  hRnnDesc,
long  hYDesc,
long  hYData,
long  hYDiff,
long  hHyDesc,
long  hHyDiff,
long  hCyDesc,
long  hCyDiff,
long  hWtDesc,
long  hWtData,
long  hHxDesc,
long  hHxData,
long  hCxDesc,
long  hCxData,
long  hXDesc,
long  hXDiff,
long  hdHxDesc,
long  hHxDiff,
long  hdCxDesc,
long  hCxDiff,
long  hWo