The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL. More...

Inheritance diagram for MyCaffe.common.CudaDnn< T >:

Public Member Functions
	CudaDnn (int nDeviceID, DEVINIT flags=(DEVINIT.CUBLAS\|DEVINIT.CURAND), long? lSeed=null, string strPath="", bool bResetFirst=false, bool bEnableMemoryTrace=false)
	The CudaDnn constructor. More...

	CudaDnn (CudaDnn< T > cuda, bool bEnableGhostMemory)
	Alternate CudaDnn constructor. More...

void	Dispose ()
	Disposes this instance freeing up all of its host and GPU memory. More...

void	DisableGhostMemory ()
	Disables the ghost memory, if enabled. More...

void	ResetGhostMemory ()
	Resets the ghost memory by enabling it if this instance was configured to use ghost memory. More...

void	KernelCopy (int nCount, long hSrc, int nSrcOffset, long hDstKernel, long hDst, int nDstOffset, long hHostBuffer, long hHostKernel=-1, long hStream=-1, long hSrcKernel=-1)
	Copy memory from the look-up tables in one kernel to another. More...

void	KernelAdd (int nCount, long hA, long hDstKernel, long hB, long hC)
	Add memory from one kernel to memory residing on another kernel. More...

long	KernelCopyNccl (long hSrcKernel, long hSrcNccl)
	Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance. More...

void	SetDeviceID (int nDeviceID=-1, DEVINIT flags=DEVINIT.NONE, long? lSeed=null)
	Set the device ID used by the current instance of CudaDnn. More...

void	SetRandomSeed (long lSeed)
	Set the random number generator seed. More...

int	GetDeviceID ()
	Returns the current device id set within Cuda. More...

string	GetDeviceName (int nDeviceID)
	Query the name of a device. More...

string	GetDeviceP2PInfo (int nDeviceID)
	Query the peer-to-peer information of a device. More...

string	GetDeviceInfo (int nDeviceID, bool bVerbose=false)
	Query the device information of a device. More...

void	ResetDevice ()
	Reset the current device. More...

void	SynchronizeDevice ()
	Synchronize the operations on the current device. More...

int	GetMultiGpuBoardGroupID (int nDeviceID)
	Query the mutli-gpu board group id for a device. More...

int	GetDeviceCount ()
	Query the number of devices (gpu's) installed. More...

bool	CheckMemoryAttributes (long hSrc, int nSrcDeviceID, long hDst, int nDstDeviceID)
	Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers. More...

double	GetDeviceMemory (out double dfFree, out double dfUsed, out bool bCudaCallUsed, int nDeviceID=-1)
	Queries the amount of total, free and used memory on a given GPU. More...

string	GetRequiredCompute (out int nMinMajor, out int nMinMinor)
	The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used. More...

bool	DeviceCanAccessPeer (int nSrcDeviceID, int nPeerDeviceID)
	Query whether or not two devices can access each other via peer-to-peer memory copies. More...

void	DeviceEnablePeerAccess (int nPeerDeviceID)
	Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...

void	DeviceDisablePeerAccess (int nPeerDeviceID)
	Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device. More...

long	AllocMemory (List< double > rg)
	Allocate a block of GPU memory and copy a list of doubles to it. More...

long	AllocMemory (List< float > rg)
	Allocate a block of GPU memory and copy a list of floats to it. More...

long	AllocMemory (double[] rgSrc, long hStream=0)
	Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy. More...

long	AllocMemory (float[] rgSrc, long hStream=0)
	Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy. More...

long	AllocMemory (T[] rgSrc, long hStream=0, bool bHalfSize=false)
	Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy. More...

long	AllocMemory (long lCapacity, bool bHalfSize=false)
	Allocate a block of GPU memory with a specified capacity. More...

void	FreeMemory (long hMem)
	Free previously allocated GPU memory. More...

void	CopyDeviceToHost (long lCount, long hGpuSrc, long hHostDst)
	Copy from GPU memory to Host memory. More...

void	CopyHostToDevice (long lCount, long hHostSrc, long hGpuDst)
	Copy from Host memory to GPU memory. More...

long	AllocHostBuffer (long lCapacity)
	Allocate a block of host memory with a specified capacity. More...

void	FreeHostBuffer (long hMem)
	Free previously allocated host memory. More...

long	GetHostBufferCapacity (long hMem)
	Returns the host memory capacity. More...

double[]	GetHostMemoryDouble (long hMem)
	Retrieves the host memory as an array of doubles. More...

float[]	GetHostMemoryFloat (long hMem)
	Retrieves the host memory as an array of floats. More...

T[]	GetHostMemory (long hMem)
	Retrieves the host memory as an array of type 'T' More...

double[]	GetMemoryDouble (long hMem, long lCount=-1)
	Retrieves the GPU memory as an array of doubles. More...

float[]	GetMemoryFloat (long hMem, long lCount=-1)
	Retrieves the GPU memory as an array of float. More...

T[]	GetMemory (long hMem, long lCount=-1)
	Retrieves the GPU memory as an array of type 'T' More...

void	SetMemory (long hMem, List< double > rg)
	Copies a list of doubles into a block of already allocated GPU memory. More...

void	SetMemory (long hMem, List< float > rg)
	Copies a list of float into a block of already allocated GPU memory. More...

void	SetMemory (long hMem, double[] rgSrc, long hStream=0)
	Copies an array of double into a block of already allocated GPU memory. More...

void	SetMemory (long hMem, float[] rgSrc, long hStream=0)
	Copies an array of float into a block of already allocated GPU memory. More...

void	SetMemory (long hMem, T[] rgSrc, long hStream=0, int nCount=-1)
	Copies an array of type 'T' into a block of already allocated GPU memory. More...

void	SetMemoryAt (long hMem, double[] rgSrc, int nOffset)
	Copies an array of double into a block of already allocated GPU memory starting at a specific offset. More...

void	SetMemoryAt (long hMem, float[] rgSrc, int nOffset)
	Copies an array of float into a block of already allocated GPU memory starting at a specific offset. More...

void	SetMemoryAt (long hMem, T[] rgSrc, int nOffset)
	Copies an array of type 'T' into a block of already allocated GPU memory starting at a specific offset. More...

T[]	SetPixel (long hMem, int nCount, bool bReturnOriginal, int nOffset, params Tuple< int, T >[] rgPixel)
	Set a pixel value where each pixel is defined a set index, value tuple. More...

void	SetHostMemory (long hMem, T[] rgSrc)
	Copies an array of type 'T' into a block of already allocated host memory. More...

long	CreateMemoryPointer (long hData, long lOffset, long lCount)
	Creates a memory pointer into an already existing block of GPU memory. More...

void	FreeMemoryPointer (long hData)
	Frees a memory pointer. More...

long	CreateMemoryTest (out ulong ulTotalNumBlocks, out double dfMemAllocatedInGB, out ulong ulMemStartAddr, out ulong ulBlockSize, double dfPctToAllocate=1.0)
	Creates a new memory test on the current GPU. More...

void	FreeMemoryTest (long h)
	Free a memory test, freeing up all GPU memory used. More...

T[]	RunMemoryTest (long h, MEMTEST_TYPE type, ulong ulBlockStartOffset, ulong ulBlockCount, bool bVerbose, bool bWrite, bool bReadWrite, bool bRead)
	The RunMemoryTest method runs the memory test from the block start offset through the block count on the memory previously allocated using CreateMemoryTest. More...

long	CreateImageOp (int nNum, double dfBrightnessProb, double dfBrightnessDelta, double dfContrastProb, double dfContrastLower, double dfContrastUpper, double dfSaturationProb, double dfSaturationLower, double dfSaturationUpper, long lRandomSeed=0)
	Create a new ImageOp used to perform image operations on the GPU. More...

void	FreeImageOp (long h)
	Free an image op, freeing up all GPU memory used. More...

void	DistortImage (long h, int nCount, int nNum, int nDim, long hX, long hY)
	Distort an image using the ImageOp handle provided. More...

long	CreateStream (bool bNonBlocking=false, int nIndex=-1)
	Create a new stream on the current GPU. More...

void	FreeStream (long h)
	Free a stream. More...

void	SynchronizeStream (long h=0)
	Synchronize a stream on the current GPU, waiting for its operations to complete. More...

void	SynchronizeThread ()
	Synchronize all kernel threads on the current GPU. More...

long	CreateCuDNN (long hStream=0)
	Create a new instance of NVIDIA's cuDnn. More...

void	FreeCuDNN (long h)
	Free an instance of cuDnn. More...

long	CreateNCCL (int nDeviceId, int nCount, int nRank, Guid guid)
	Create an instance of NVIDIA's NCCL 'Nickel' More...

void	FreeNCCL (long hNccl)
	Free an instance of NCCL. More...

void	NcclInitializeSingleProcess (params long[] rghNccl)
	Initializes a set of NCCL instances for use in a single process. More...

void	NcclInitializeMultiProcess (long hNccl)
	Initializes a set of NCCL instances for use in different processes. More...

void	NcclBroadcast (long hNccl, long hStream, long hX, int nCount)
	Broadcasts a block of GPU data to all NCCL instances. More...

void	NcclAllReduce (long hNccl, long hStream, long hX, int nCount, NCCL_REDUCTION_OP op, double dfScale=1.0)
	Performs a reduction on all NCCL instances as specified by the reduction operation. More...

long	CreateExtension (string strExtensionDllPath)
	Create an instance of an Extension DLL. More...

void	FreeExtension (long hExtension)
	Free an instance of an Extension. More...

T[]	RunExtension (long hExtension, long lfnIdx, T[] rgParam)
	Run a function on the extension specified. More...

long	CreateTensorDesc ()
	Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn. More...

void	FreeTensorDesc (long h)
	Free a tensor descriptor instance. More...

void	SetTensorNdDesc (long hHandle, int[] rgDim, int[] rgStride, bool bHalf=false)
	Sets the values of a tensor descriptor. More...

void	SetTensorDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
	Sets the values of a tensor descriptor. More...

void	SetTensorDesc (long hHandle, int n, int c, int h, int w, int nStride, int cStride, int hStride, int wStride, bool bHalf=false)
	Sets the values of a tensor descriptor. More...

void	AddTensor (long hCuDnn, long hSrcDesc, long hSrc, int nSrcOffset, long hDstDesc, long hDst, int nDstOffset)
	Add two tensors together. More...

void	AddTensor (long hCuDnn, T fAlpha, long hSrcDesc, long hSrc, int nSrcOffset, T fBeta, long hDstDesc, long hDst, int nDstOffset)
	Add two tensors together. More...

long	CreateFilterDesc ()
	Create a new instance of a filter descriptor for use with NVIDIA's cuDnn. More...

void	FreeFilterDesc (long h)
	Free a filter descriptor instance. More...

void	SetFilterNdDesc (long hHandle, int[] rgDim, bool bHalf=false)
	Sets the values of a filter descriptor. More...

void	SetFilterDesc (long hHandle, int n, int c, int h, int w, bool bHalf=false)
	Sets the values of a filter descriptor. More...

long	CreateConvolutionDesc ()
	Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn. More...

void	FreeConvolutionDesc (long h)
	Free a convolution descriptor instance. More...

void	SetConvolutionDesc (long hHandle, int hPad, int wPad, int hStride, int wStride, int hDilation, int wDilation, bool bUseTensorCores, bool bHalf=false)
	Set the values of a convolution descriptor. More...

void	GetConvolutionInfo (long hCuDnn, long hBottomDesc, long hFilterDesc, long hConvDesc, long hTopDesc, ulong lWorkspaceSizeLimitInBytes, bool bUseTensorCores, out CONV_FWD_ALGO algoFwd, out ulong lWsSizeFwd, out CONV_BWD_FILTER_ALGO algoBwdFilter, out ulong lWsSizeBwdFilter, out CONV_BWD_DATA_ALGO algoBwdData, out ulong lWsSizeBwdData, CONV_FWD_ALGO preferredFwdAlgo=CONV_FWD_ALGO.NONE)
	Queryies the algorithms and workspace sizes used for a given convolution descriptor. More...

void	ConvolutionForward (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
	Perform a convolution forward pass. More...

void	ConvolutionForward (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hFilterDesc, long hWeight, int nWeightOffset, long hConvDesc, CONV_FWD_ALGO algoFwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hTopDesc, long hTopData, int nTopOffset, bool bSyncStream=true)
	Perform a convolution forward pass. More...

void	ConvolutionBackwardBias (long hCuDnn, long hTopDesc, long hTopDiff, int nTopOffset, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
	Perform a convolution backward pass on the bias. More...

void	ConvolutionBackwardBias (long hCuDnn, T fAlpha, long hTopDesc, long hTopDiff, int nTopOffset, T fBeta, long hBiasDesc, long hBiasDiff, int nBiasOffset, bool bSyncStream=true)
	Perform a convolution backward pass on the bias. More...

void	ConvolutionBackwardFilter (long hCuDnn, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream)
	Perform a convolution backward pass on the filter. More...

void	ConvolutionBackwardFilter (long hCuDnn, T fAlpha, long hBottomDesc, long hBottomData, int nBottomOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_FILTER_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hFilterDesc, long hWeightDiff, int nWeightOffset, bool bSyncStream=true)
	Perform a convolution backward pass on the filter. More...

void	ConvolutionBackwardData (long hCuDnn, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
	Perform a convolution backward pass on the data. More...

void	ConvolutionBackwardData (long hCuDnn, T fAlpha, long hFilterDesc, long hWeight, int nWeightOffset, long hTopDesc, long hTopDiff, int nTopOffset, long hConvDesc, CONV_BWD_DATA_ALGO algoBwd, long hWorkspace, int nWorkspaceOffset, ulong lWorkspaceSize, T fBeta, long hBottomDesc, long hBottomDiff, int nBottomOffset, bool bSyncStream=true)
	Perform a convolution backward pass on the data. More...

long	CreatePoolingDesc ()
	Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn. More...

void	FreePoolingDesc (long h)
	Free a pooling descriptor instance. More...

void	SetPoolingDesc (long hHandle, PoolingMethod method, int h, int w, int hPad, int wPad, int hStride, int wStride)
	Set the values of a pooling descriptor. More...

void	PoolingForward (long hCuDnn, long hPoolingDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
	Perform a pooling forward pass. More...

void	PoolingBackward (long hCuDnn, long hPoolingDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a pooling backward pass. More...

void	DeriveBatchNormDesc (long hFwdScaleBiasMeanVarDesc, long hFwdBottomDesc, long hBwdScaleBiasMeanVarDesc, long hBwdBottomDesc, BATCHNORM_MODE mode)
	Derive the batch norm descriptors for both the forward and backward passes. More...

void	BatchNormForward (long hCuDnn, BATCHNORM_MODE mode, T fAlpha, T fBeta, long hFwdBottomDesc, long hBottomData, long hFwdTopDesc, long hTopData, long hFwdScaleBiasMeanVarDesc, long hScaleData, long hBiasData, double dfFactor, long hGlobalMean, long hGlobalVar, double dfEps, long hSaveMean, long hSaveInvVar, bool bTraining)
	Run the batch norm forward pass. More...

void	BatchNormBackward (long hCuDnn, BATCHNORM_MODE mode, T fAlphaDiff, T fBetaDiff, T fAlphaParamDiff, T fBetaParamDiff, long hBwdBottomDesc, long hBottomData, long hTopDiffDesc, long hTopDiff, long hBottomDiffDesc, long hBottomDiff, long hBwdScaleBiasMeanVarDesc, long hScaleData, long hScaleDiff, long hBiasDiff, double dfEps, long hSaveMean, long hSaveInvVar)
	Run the batch norm backward pass. More...

long	CreateDropoutDesc ()
	Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn. More...

void	FreeDropoutDesc (long h)
	Free a dropout descriptor instance. More...

void	SetDropoutDesc (long hCuDnn, long hDropoutDesc, double dfDropout, long hStates, long lSeed)
	Set the dropout descriptor values. More...

void	GetDropoutInfo (long hCuDnn, long hBottomDesc, out ulong ulStateCount, out ulong ulReservedCount)
	Query the dropout state and reserved counts. More...

void	DropoutForward (long hCuDnn, long hDropoutDesc, long hBottomDesc, long hBottomData, long hTopDesc, long hTopData, long hReserved)
	Performs a dropout forward pass. More...

void	DropoutBackward (long hCuDnn, long hDropoutDesc, long hTopDesc, long hTop, long hBottomDesc, long hBottom, long hReserved)
	Performs a dropout backward pass. More...

long	CreateLRNDesc ()
	Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn. More...

void	FreeLRNDesc (long h)
	Free a LRN descriptor instance. More...

void	SetLRNDesc (long hHandle, uint nSize, double fAlpha, double fBeta, double fK)
	Set the LRN descriptor values. More...

void	LRNCrossChannelForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDesc, long hBottomData, T fBeta, long hTopDesc, long hTopData)
	Perform LRN cross channel forward pass. More...

void	LRNCrossChannelBackward (long hCuDnn, long hNormDesc, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform LRN cross channel backward pass. More...

void	DivisiveNormalizationForward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTemp1, long hTemp2, T fBeta, long hTopDataDesc, long hTopData)
	Performs a Devisive Normalization forward pass. More...

void	DivisiveNormalizationBackward (long hCuDnn, long hNormDesc, T fAlpha, long hBottomDataDesc, long hBottomData, long hTopDiff, long hTemp1, long hTemp2, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Performs a Devisive Normalization backward pass. More...

void	TanhForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
	Perform a Tanh forward pass. More...

void	TanhBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a Tanh backward pass. More...

void	EluForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
	Perform a Elu forward pass. More...

void	EluBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a Elu backward pass. More...

void	SigmoidForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
	Perform a Sigmoid forward pass. More...

void	SigmoidBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a Sigmoid backward pass. More...

void	ReLUForward (long hCuDnn, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
	Perform a ReLU forward pass. More...

void	ReLUBackward (long hCuDnn, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, long hBottomDataDesc, long hBottomData, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a ReLU backward pass. More...

void	SoftmaxForward (long hCuDnn, SOFTMAX_ALGORITHM alg, SOFTMAX_MODE mode, T fAlpha, long hBottomDataDesc, long hBottomData, T fBeta, long hTopDataDesc, long hTopData)
	Perform a Softmax forward pass. More...

void	SoftmaxBackward (long hCuDnn, SOFTMAX_ALGORITHM alg, SOFTMAX_MODE mode, T fAlpha, long hTopDataDesc, long hTopData, long hTopDiffDesc, long hTopDiff, T fBeta, long hBottomDiffDesc, long hBottomDiff)
	Perform a Softmax backward pass. More...

long	CreateRnnDataDesc ()
	Create the RNN Data Descriptor. More...

void	FreeRnnDataDesc (long h)
	Free an existing RNN Data descriptor. More...

void	SetRnnDataDesc (long hRnnDataDesc, RNN_DATALAYOUT layout, int nMaxSeqLen, int nBatchSize, int nVectorSize, bool bBidirectional=false, int[] rgSeqLen=null)
	Sets the RNN Data Descriptor values. More...

long	CreateRnnDesc ()
	Create the RNN Descriptor. More...

void	FreeRnnDesc (long h)
	Free an existing RNN descriptor. More...

void	SetRnnDesc (long hCuDnn, long hRnnDesc, int nHiddenCount, int nNumLayers, long hDropoutDesc, RNN_MODE mode, bool bUseTensorCores, RNN_DIRECTION direction=RNN_DIRECTION.RNN_UNIDIRECTIONAL)
	Sets the RNN Descriptor values. More...

int	GetRnnParamCount (long hCuDnn, long hRnnDesc, long hXDesc)
	Returns the RNN parameter count. More...

ulong	GetRnnWorkspaceCount (long hCuDnn, long hRnnDesc, long hXDesc, out ulong nReservedCount)
	Returns the workspace and reserved counts. More...

void	GetRnnLinLayerParams (long hCuDnn, long hRnnDesc, int nLayer, long hXDesc, long hWtDesc, long hWtData, int nLinLayer, out int nWtCount, out long hWt, out int nBiasCount, out long hBias)
	Returns the linear layer parameters (weights). More...

void	RnnForward (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hWtDesc, long hWtData, long hYDesc, long hYData, long hHyDesc, long hHyData, long hCyDesc, long hCyData, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount, bool bTraining)
	Run the RNN through a forward pass. More...

void	RnnBackwardData (long hCuDnn, long hRnnDesc, long hYDesc, long hYData, long hYDiff, long hHyDesc, long hHyDiff, long hCyDesc, long hCyDiff, long hWtDesc, long hWtData, long hHxDesc, long hHxData, long hCxDesc, long hCxData, long hXDesc, long hXDiff, long hdHxDesc, long hHxDiff, long hdCxDesc, long hCxDiff, long hWorkspace, ulong nWsCount, long hReserved, ulong nResCount)
	Run the RNN backward pass through the data. More...

void	RnnBackwardWeights (long hCuDnn, long hRnnDesc, long hXDesc, long hXData, long hHxDesc, long hHxData, long hYDesc, long hYData, long hWorkspace, ulong nWsCount, long hWtDesc, long hWtDiff, long hReserved, ulong nResCount)
	Run the RNN backward pass on the weights. More...

bool	IsRnn8Supported ()
	Returns whether or not RNN8 is supported. More...

long	CreateRnn8 ()
	Create the RNN8. More...

void	FreeRnn8 (long h)
	Free an existing RNN8. More...

void	SetRnn8 (long hCuDnn, long hRnn, bool bTraining, RNN_DATALAYOUT layout, RNN_MODE cellMode, RNN_BIAS_MODE biasMode, int nSequenceLen, int nBatchSize, int nInputs, int nHidden, int nOutputs, int nProjection, int nNumLayers, float fDropout, ulong lSeed, bool bBidirectional=false)
	Set the RNN8 parameters. More...

void	GetRnn8MemorySizes (long hCuDnn, long hRnn, out ulong szWtCount, out ulong szWorkSize, out ulong szReservedSize)
	Returns the memory sizes required for the RNN8. More...

void	InitializeRnn8Weights (long hCuDnn, long hRnn, long hWt, RNN_FILLER_TYPE wtFt, double fWtVal, double fWtVal2, RNN_FILLER_TYPE biasFt, double fBiasVal, double fBiasVal2)
	Initialize the RNN8 weights More...

void	Rnn8Forward (long hCuDnn, long hRnn, long hX, long hY, long hhX, long hhY, long hcX, long hcY, long hWts, long hWork, long hReserved)
	Calculate the forward pass through the RNN8. More...

void	Rnn8Backward (long hCuDnn, long hRnn, long hY, long hdY, long hX, long hdX, long hhX, long hdhY, long hdhX, long hcX, long hdcY, long hdcX, long hWt, long hdWt, long hWork, long hReserved)
	Calculate the backward pass through the RNN8 for both data and weights. More...

long	AllocPCAData (int nM, int nN, int nK, out int nCount)
	Allocates the GPU memory for the PCA Data. More...

long	AllocPCAScores (int nM, int nN, int nK, out int nCount)
	Allocates the GPU memory for the PCA scores. More...

long	AllocPCALoads (int nM, int nN, int nK, out int nCount)
	Allocates the GPU memory for the PCA loads. More...

long	AllocPCAEigenvalues (int nM, int nN, int nK, out int nCount)
	Allocates the GPU memory for the PCA eigenvalues. More...

long	CreatePCA (int nMaxIterations, int nM, int nN, int nK, long hData, long hScoresResult, long hLoadsResult, long hResiduals=0, long hEigenvalues=0)
	Creates a new PCA instance and returns the handle to it. More...

bool	RunPCA (long hPCA, int nSteps, out int nCurrentK, out int nCurrentIteration)
	Runs a number of steps of the iterative PCA algorithm. More...

void	FreePCA (long hPCA)
	Free the PCA instance associated with handle. More...

long	CreateSSD (int nNumClasses, bool bShareLocation, int nLocClasses, int nBackgroundLabelId, bool bUseDiffcultGt, SSD_MINING_TYPE miningType, SSD_MATCH_TYPE matchType, float fOverlapThreshold, bool bUsePriorForMatching, SSD_CODE_TYPE codeType, bool bEncodeVariantInTgt, bool bBpInside, bool bIgnoreCrossBoundaryBbox, bool bUsePriorForNms, SSD_CONF_LOSS_TYPE confLossType, SSD_LOC_LOSS_TYPE locLossType, float fNegPosRatio, float fNegOverlap, int nSampleSize, bool bMapObjectToAgnostic, bool bNmsParam, float? fNmsThreshold=null, int? nNmsTopK=null, float? fNmsEta=null)
	Create an instance of the SSD GPU support. More...

void	SetupSSD (long hSSD, int nNum, int nNumPriors, int nNumGt)
	Setup the SSD GPU support. More...

void	FreeSSD (long hSSD)
	Free the instance of SSD GPU support. More...

int	SsdMultiBoxLossForward (long hSSD, int nLocDataCount, long hLocGpuData, int nConfDataCount, long hConfGpuData, int nPriorDataCount, long hPriorGpuData, int nGtDataCount, long hGtGpuData, out List< DictionaryMap< List< int > > > rgAllMatchIndices, out List< List< int > > rgrgAllNegIndices, out int nNumNegs)
	Performs the SSD MultiBoxLoss forward operation. More...

void	SsdEncodeLocPrediction (long hSSD, int nLocPredCount, long hLocPred, int nLocGtCount, long hLocGt)
	Encodes the SSD data into the location prediction and location ground truths. More...

void	SsdEncodeConfPrediction (long hSSD, int nConfPredCount, long hConfPred, int nConfGtCount, long hConfGt)
	Encodes the SSD data into the confidence prediction and confidence ground truths. More...

long	CreateLayerNorm (int nGpuID, int nCount, int nOuterNum, int nChannels, int nInnerNum, float fEps=1e-10f)
	Create the Cuda version of LayerNorm More...

void	FreeLayerNorm (long hLayerNorm)
	Free the instance of LayerNorm GPU support. More...

void	LayerNormForward (long hLayerNorm, long hXdata, long hYdata)
	Run the LayerNorm forward pass. More...

void	LayerNormBackward (long hLayerNorm, long hYdata, long hYdiff, long hXdiff)
	Run the LayerNorm backward pass. More...

void	set (int nCount, long hHandle, double fVal, int nIdx=-1)
	Set the values of GPU memory to a specified value of type More...

void	set (int nCount, long hHandle, float fVal, int nIdx=-1)
	Set the values of GPU memory to a specified value of type More...

void	set (int nCount, long hHandle, T fVal, int nIdx=-1, int nXOff=0)
	Set the values of GPU memory to a specified value of type 'T'. More...

double[]	get_double (int nCount, long hHandle, int nIdx=-1)
	Queries the GPU memory by copying it into an array of More...

float[]	get_float (int nCount, long hHandle, int nIdx=-1)
	Queries the GPU memory by copying it into an array of More...

T[]	get (int nCount, long hHandle, int nIdx=-1)
	Queries the GPU memory by copying it into an array of type 'T'. More...

void	copy (int nCount, long hSrc, long hDst, int nSrcOffset=0, int nDstOffset=0, long hStream=-1, bool? bSrcHalfSizeOverride=null, bool? bDstHalfSizeOverride=null)
	Copy data from one block of GPU memory to another. More...

void	copy (int nCount, int nNum, int nDim, long hSrc1, long hSrc2, long hDst, long hSimilar, bool bInvert=false)
	Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0). More...

void	copy_batch (int nCount, int nNum, int nDim, long hSrcData, long hSrcLbl, int nDstCount, long hDstCache, long hWorkDevData, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, long hWorkDataHost)
	Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data. More...

void	copy_sequence (int nK, int nNum, int nDim, long hSrcData, long hSrcLbl, int nSrcCacheCount, long hSrcCache, int nLabelStart, int nLabelCount, int nCacheSize, long hCacheHostCursors, bool bOutputLabels, List< long > rghTop, List< int > rgnTopCount, long hWorkDataHost, bool bCombinePositiveAndNegative=false, int nSeed=0)
	Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs. More...

void	copy_sequence (int n, long hSrc, int nSrcStep, int nSrcStartIdx, int nCopyCount, int nCopyDim, long hDst, int nDstStep, int nDstStartIdx, int nSrcSpatialDim, int nDstSpatialDim, int nSrcSpatialDimStartIdx=0, int nDstSpatialDimStartIdx=0, int nSpatialDimCount=-1)
	Copy a sequence from a source to a destination and allow for skip steps. More...

void	copy_expand (int n, int nNum, int nDim, long hX, long hA)
	Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row. More...

void	fill (int n, int nDim, long hSrc, int nSrcOff, int nCount, long hDst)
	Fill data from the source data 'n' times in the destination. More...

void	sort (int nCount, long hY)
	Sort the data in the GPU memory specified. More...

void	gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	gemm (bool bTransA, bool bTransB, int m, int n, int k, float fAlpha, long hA, long hB, float fBeta, long hC)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	gemm (bool bTransA, bool bTransB, int m, int n, int k, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0, int nGroups=1, int nGroupOffsetA=0, int nGroupOffsetB=0, int nGroupOffsetC=0)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC, uint lda, uint ldb, uint ldc)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	gemm (bool bTransA, bool bTransB, int m, int n, int k, double fAlpha, long hA, long hB, double fBeta, long hC, uint lda, uint ldb, uint ldc, uint stridea, uint strideb, uint stridec, uint batch_count)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	geam (bool bTransA, bool bTransB, int m, int n, double fAlpha, long hA, long hB, double fBeta, long hC)
	Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...

void	geam (bool bTransA, bool bTransB, int m, int n, float fAlpha, long hA, long hB, float fBeta, long hC)
	Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B) More...

void	geam (bool bTransA, bool bTransB, int m, int n, T fAlpha, long hA, long hB, T fBeta, long hC, int nAOffset=0, int nBOffset=0, int nCOffset=0)
	Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C More...

void	gemv (bool bTransA, int m, int n, double fAlpha, long hA, long hX, double fBeta, long hY)
	Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...

void	gemv (bool bTransA, int m, int n, float fAlpha, long hA, long hX, float fBeta, long hY)
	Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...

void	gemv (bool bTransA, int m, int n, T fAlpha, long hA, long hX, T fBeta, long hY, int nAOffset=0, int nXOffset=0, int nYOffset=0)
	Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors) More...

void	ger (int m, int n, double fAlpha, long hX, long hY, long hA)
	Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...

void	ger (int m, int n, float fAlpha, long hX, long hY, long hA)
	Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...

void	ger (int m, int n, T fAlpha, long hX, long hY, long hA)
	Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix) More...

void	axpy (int n, double fAlpha, long hX, long hY)
	Multiply the vector X by a scalar and add the result to the vector Y. More...

void	axpy (int n, float fAlpha, long hX, long hY)
	Multiply the vector X by a scalar and add the result to the vector Y. More...

void	axpy (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
	Multiply the vector X by a scalar and add the result to the vector Y. More...

void	axpby (int n, double fAlpha, long hX, double fBeta, long hY)
	Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...

void	axpby (int n, float fAlpha, long hX, float fBeta, long hY)
	Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y. More...

void	axpby (int n, T fAlpha, long hX, T fBeta, long hY)
	Scale the vector x by Alpha and scale vector y by Beta and then add both together. More...

void	mulbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
	Multiply a matrix with a vector. More...

void	divbsx (int n, long hA, int nAOff, long hX, int nXOff, int nC, int nSpatialDim, bool bTranspose, long hB, int nBOff)
	Divide a matrix by a vector. More...

void	matmul (uint nOuterCount, int m, int n, int k, long hA, long hB, long hC, double dfScale=1.0, bool bTransA=false, bool bTransB=false)
	Perform matmul operation hC = matmul(hA, hB), where hA, hB and hC are all in row-major format. More...

void	transposeHW (int n, int c, int h, int w, long hSrc, long hDst)
	Transpose a n*c number of matrices along the height and width dimensions. All matrices are in row-major format. More...

void	set_bounds (int n, double dfMin, double dfMax, long hX)
	Set the bounds of all items within the data to a set range of values. More...

void	scal (int n, double fAlpha, long hX, int nXOff=0)
	Scales the data in X by a scaling factor. More...

void	scal (int n, float fAlpha, long hX, int nXOff=0)
	Scales the data in X by a scaling factor. More...

void	scal (int n, T fAlpha, long hX, int nXOff=0)
	Scales the data in X by a scaling factor. More...

double	dot_double (int n, long hX, long hY)
	Computes the dot product of X and Y. More...

float	dot_float (int n, long hX, long hY)
	Computes the dot product of X and Y. More...

T	dot (int n, long hX, long hY, int nXOff=0, int nYOff=0)
	Computes the dot product of X and Y. More...

double	asum_double (int n, long hX, int nXOff=0)
	Computes the sum of absolute values in X. More...

float	asum_float (int n, long hX, int nXOff=0)
	Computes the sum of absolute values in X. More...

T	asum (int n, long hX, int nXOff=0)
	Computes the sum of absolute values in X. More...

void	scale (int n, double fAlpha, long hX, long hY)
	Scales the values in X and places them in Y. More...

void	scale (int n, float fAlpha, long hX, long hY)
	Scales the values in X and places them in Y. More...

void	scale (int n, T fAlpha, long hX, long hY, int nXOff=0, int nYOff=0)
	Scales the values in X and places them in Y. More...

void	scale_to_range (int n, long hX, long hY, double fMin, double fMax)
	Scales the values in X and places the result in Y (can also run inline where X = Y). More...

double	erf (double dfVal)
	Calculates the erf() function. More...

float	erf (float fVal)
	Calculates the erf() function. More...

T	erf (T fVal)
	Calculates the erf() function. More...

void	mask (int n, int nMaskDim, T fSearch, T fReplace, long hX, long hMask, long hY)
	Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	mask (int n, int nMaskDim, double fSearch, double fReplace, long hX, long hMask, long hY)
	Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	mask (int n, int nMaskDim, float fSearch, float fReplace, long hX, long hMask, long hY)
	Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	mask_batch (int n, int nBatch, int nMaskDim, T fSearch, T fReplace, long hX, long hMask, long hY)
	Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	mask_batch (int n, int nBatch, int nMaskDim, double fSearch, double fReplace, long hX, long hMask, long hY)
	Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	mask_batch (int n, int nBatch, int nMaskDim, float fSearch, float fReplace, long hX, long hMask, long hY)
	Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination. More...

void	interp2 (int nChannels, long hData1, int nX1, int nY1, int nHeight1, int nWidth1, int nHeight1A, int nWidth1A, long hData2, int nX2, int nY2, int nHeight2, int nWidth2, int nHeight2A, int nWidth2A, bool bBwd=false)
	Interpolates between two sizes within the spatial dimensions. More...

void	add_scalar (int n, double fAlpha, long hY)
	Adds a scalar value to each element of Y. More...

void	add_scalar (int n, float fAlpha, long hY)
	Adds a scalar value to each element of Y. More...

void	add_scalar (int n, T fAlpha, long hY, int nYOff=0)
	Adds a scalar value to each element of Y. More...

void	add (int n, long hA, long hB, long hC, long hY)
	Adds A, B and C and places the result in Y. More...

void	add (int n, long hA, long hB, long hY)
	Adds A to B and places the result in Y. More...

void	add (int n, long hA, long hB, long hY, double dfAlpha)
	Adds A to (B times scalar) and places the result in Y. More...

void	add (int n, long hA, long hB, long hY, float fAlpha)
	Adds A to (B times scalar) and places the result in Y. More...

void	add (int n, long hA, long hB, long hY, double dfAlphaA, double dfAlphaB, int nAOff=0, int nBOff=0, int nYOff=0)
	Adds A to (B times scalar) and places the result in Y. More...

void	sub (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0, int nB=0)
	Subtracts B from A and places the result in Y. More...

void	mul (int n, long hA, long hB, long hY, int nAOff=0, int nBOff=0, int nYOff=0)
	Multiplies each element of A with each element of B and places the result in Y. More...

void	sub_and_dot (int n, int nN, int nInnerNum, long hA, long hB, long hY, int nAOff, int nBOff, int nYOff)
	Subtracts every nInnterNum element of B from A and performs a dot product on the result. More...

void	mul_scalar (int n, double fAlpha, long hY)
	Mutlipy each element of Y by a scalar. More...

void	mul_scalar (int n, float fAlpha, long hY)
	Mutlipy each element of Y by a scalar. More...

void	mul_scalar (int n, T fAlpha, long hY)
	Mutlipy each element of Y by a scalar. More...

void	div (int n, long hA, long hB, long hY)
	Divides each element of A by each element of B and places the result in Y. More...

void	abs (int n, long hA, long hY)
	Calculates the absolute value of A and places the result in Y. More...

void	exp (int n, long hA, long hY)
	Calculates the exponent value of A and places the result in Y. More...

void	exp (int n, long hA, long hY, int nAOff, int nYOff, double dfBeta)
	Calculates the exponent value of A * beta and places the result in Y. More...

void	log (int n, long hA, long hY)
	Calculates the log value of A and places the result in Y. More...

void	log (int n, long hA, long hY, double dfBeta, double dfAlpha=0)
	Calculates the log value of (A * beta) + alpha, and places the result in Y. More...

void	powx (int n, long hA, double fAlpha, long hY, int nAOff=0, int nYOff=0)
	Calculates the A raised to the power alpha and places the result in Y. More...

void	powx (int n, long hA, float fAlpha, long hY, int nAOff=0, int nYOff=0)
	Calculates the A raised to the power alpha and places the result in Y. More...

void	powx (int n, long hA, T fAlpha, long hY, int nAOff=0, int nYOff=0)
	Calculates the A raised to the power alpha and places the result in Y. More...

void	sign (int n, long hX, long hY, int nXOff=0, int nYOff=0)
	Computes the sign of each element of X and places the result in Y. More...

void	sqrt (int n, long hX, long hY)
	Computes the square root of each element of X and places the result in Y. More...

void	sqrt_scale (int nCount, long hX, long hY)
	Scale the data by the sqrt of the data. y = sqrt(abs(x)) * sign(x) More...

void	compare_signs (int n, long hA, long hB, long hY)
	Compares the signs of each value in A and B and places the result in Y. More...

void	max (int n, long hA, long hB, long hY)
	Calculates the max of A and B and places the result in Y. This max is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the max of A(0) and B(0), etc. More...

void	max_bwd (int n, long hAdata, long hBdata, long hYdiff, long hAdiff, long hBdiff)
	Propagates the Y diff back to the max of A or B and places the result in A if its data has the max, or B if its data has the max. More...

void	min (int n, long hA, long hB, long hY)
	Calculates the min of A and B and places the result in Y. This min is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the min of A(0) and B(0), etc. More...

double	max (int n, long hA, out long lPos, int nAOff=0, long hWork=0)
	Finds the maximum value of A. More...

double	min (int n, long hA, out long lPos, int nAOff=0, long hWork=0)
	Finds the minimum value of A. More...

Tuple< double, double, double, double >	minmax (int n, long hA, long hWork1, long hWork2, bool bDetectNans=false, int nAOff=0)
	Finds the minimum and maximum values within A. More...

void	minmax (int n, long hA, long hWork1, long hWork2, int nK, long hMin, long hMax, bool bNonZeroOnly)
	Finds up to 'nK' minimum and maximum values within A. More...

void	transpose (int n, long hX, long hY, long hXCounts, long hYCounts, long hMapping, int nNumAxes, long hBuffer)
	Perform a transpose on X producing Y, similar to the numpy.transpose operation. More...

double	sumsq (int n, long hW, long hA, int nAOff=0)
	Calculates the sum of squares of A. More...

double	sumsqdiff (int n, long hW, long hA, long hB, int nAOff=0, int nBOff=0)
	Calculates the sum of squares of differences between A and B More...

void	width (int n, long hMean, long hMin, long hMax, double dfAlpha, long hWidth)
	Calculates the width values. More...

bool	contains_point (int n, long hMean, long hWidth, long hX, long hWork, int nXOff=0)
	Returns true if the point is contained within the bounds. More...

void	denan (int n, long hX, double dfReplacement)
	Replaces all NAN values witin X with a replacement value. More...

void	im2col (long hDataIm, int nDataImOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataCol, int nDataColOffset)
	Rearranges image blocks into columns. More...

void	im2col_nd (long hDataIm, int nDataImOffset, int nNumSpatialAxes, int nImCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataCol, int nDataColOffset)
	Rearranges image blocks into columns. More...

void	col2im (long hDataCol, int nDataColOffset, int nChannels, int nHeight, int nWidth, int nKernelH, int nKernelW, int nPadH, int nPadW, int nStrideH, int nStrideW, int nDilationH, int nDilationW, long hDataIm, int nDataImOffset)
	Rearranges the columns into image blocks. More...

void	col2im_nd (long hDataCol, int nDataColOffset, int nNumSpatialAxes, int nColCount, int nChannelAxis, long hImShape, long hColShape, long hKernelShape, long hPad, long hStride, long hDilation, long hDataIm, int nDataImOffset)
	Rearranges the columns into image blocks. More...

void	channel_min (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bReturnIdx=false)
	Calculates the minimum value within each channel of X and places the result in Y. More...

void	channel_max (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bReturnIdx=false)
	Calculates the maximum value within each channel of X and places the result in Y. More...

void	channel_mean (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
	Calculates the mean value of each channel of X and places the result in Y. More...

void	channel_compare (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
	Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set. More...

void	channel_fillfrom (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, DIR dir)
	Fills each channel with the the values stored in Src data where the X data continains nOuterNum x nChannels of data, (e.g. one item per channel) that is then copied to all nInnerNum elements of each channel in Y More...

void	channel_fill (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, int nLabelDim, long hLabels, long hY)
	Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels. More...

void	channel_sub (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hA, long hX, long hY)
	Subtracts the values across the channels of X from A and places the result in Y. More...

void	channel_sub (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
	Subtracts the values across the channels from X and places the result in Y. More...

void	channel_sum (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, bool bSumAcrossChannels=true, DIR dir=DIR.FWD, int nChannelsY=-1)
	Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y. More...

void	channel_div (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
	Divides the values of the channels from X and places the result in Y. More...

void	channel_mul (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, int nMethod=1)
	Multiplies the values of the channels from X and places the result in Y. More...

void	channel_mulv (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hA, long hX, long hC)
	Multiplies the values in vector X by each channel in matrix A and places the result in matrix C. More...

void	channel_scale (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
	Multiplies the values of the channels from X with the scalar values in B and places the result in Y. More...

void	channel_dot (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hA, long hY)
	Calculates the dot product the the values within each channel of X and places the result in Y. More...

void	channel_duplicate (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
	Duplicates each channel 'nInnerNum' of times in the destination. More...

void	channel_percentile (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY, double dfPercentile)
	Calculates the percentile along axis = 0. More...

void	channel_op_fwd (OP op, int nCount, int nC, int nN1, int nSD1, int nN2, int nSD2, long hA, long hB, long hY)
	Performs a channel operation forward on the data. More...

void	channel_op_bwd (OP op, int nCount, int nC, int nN1, int nSD1, int nN2, int nSD2, int nCy, int nSDy, long hA, long hB, long hY, long hAd, long hBd, long hYd, long hWork)
	Performs a channel operation backward on the data. More...

void	channel_add (int nCount, int nOuterNum, int nChannels, int nBlocks, int nInnerNum, int nOffset, long hX, long hY, DIR dir)
	Add data along channels similar to numpy split function but where the data is added instead of copied. More...

void	channel_copy (int nCount, int nOuterNum, int nChannels, int nBlocks, int nInnerNum, int nOffset, long hX, long hY, DIR dir)
	Copy data along channels similar to numpy split function. More...

void	channel_copyall (int nCount, int nOuterNum, int nChannels, int nInnerNum, long hX, long hY)
	Copy all data from X (shape 1,c,sd) to each num in Y (shape n,c,sd). More...

void	sum (int nCount, int nOuterNum, int nInnerNum, long hX, long hY)
	Calculates the sum of inner values of X and places the result in Y. More...

void	rng_setseed (long lSeed)
	Sets the random number generator seed used by random number operations. More...

void	rng_uniform (int n, double fMin, double fMax, long hY)
	Fill Y with random numbers using a uniform random distribution. More...

void	rng_uniform (int n, float fMin, float fMax, long hY)
	Fill Y with random numbers using a uniform random distribution. More...

void	rng_uniform (int n, T fMin, T fMax, long hY)
	Fill Y with random numbers using a uniform random distribution. More...

void	rng_gaussian (int n, double fMu, double fSigma, long hY)
	Fill Y with random numbers using a gaussian random distribution. More...

void	rng_gaussian (int n, float fMu, float fSigma, long hY)
	Fill Y with random numbers using a gaussian random distribution. More...

void	rng_gaussian (int n, T fMu, T fSigma, long hY)
	Fill Y with random numbers using a gaussian random distribution. More...

void	rng_bernoulli (int n, double fNonZeroProb, long hY)
	Fill Y with random numbers using a bernoulli random distribution. More...

void	rng_bernoulli (int n, float fNonZeroProb, long hY)
	Fill Y with random numbers using a bernoulli random distribution. More...

void	rng_bernoulli (int n, T fNonZeroProb, long hY)
	Fill Y with random numbers using a bernoulli random distribution. More...

void	accuracy_fwd (int nCount, int nOuterNum, int nInnerNum, long hBottomData, long hBottomLabel, long hAccData, long hAccTotals, int? nIgnoreLabel, bool bLastElementOnly, int nBatch)
	Performs the forward pass for the accuracy layer More...

void	batchreidx_fwd (int nCount, int nInnerDim, long hBottomData, long hPermutData, long hTopData)
	Performs the forward pass for batch re-index More...

void	batchreidx_bwd (int nCount, int nInnerDim, long hTopDiff, long hTopIdx, long hBegins, long hCounts, long hBottomDiff)
	Performs the backward pass for batch re-index More...

void	embed_fwd (int nCount, long hBottomData, long hWeight, int nM, int nN, int nK, long hTopData)
	Performs the forward pass for embed More...

void	embed_bwd (int nCount, long hBottomData, long hTopDiff, int nM, int nN, int nK, long hWeightDiff)
	Performs the backward pass for embed More...

void	pooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask, long hTopMask)
	Performs the forward pass for pooling using Cuda More...

void	pooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask, long hTopMask)
	Performs the backward pass for pooling using Cuda More...

void	unpooling_fwd (POOLING_METHOD method, int nCount, long hBottomData, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hTopData, long hMask)
	Performs the forward pass for unpooling using Cuda More...

void	unpooling_bwd (POOLING_METHOD method, int nCount, long hTopDiff, int num, int nChannels, int nHeight, int nWidth, int nPooledHeight, int nPooledWidth, int nKernelH, int nKernelW, int nStrideH, int nStrideW, int nPadH, int nPadW, long hBottomDiff, long hMask)
	Performs the backward pass for unpooling using Cuda More...

void	clip_fwd (int nCount, long hBottomData, long hTopData, T fMin, T fMax)
	Performs a Clip forward pass in Cuda. More...

void	clip_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff, T fMin, T fMax)
	Performs a Clip backward pass in Cuda. More...

void	math_fwd (int nCount, long hBottomData, long hTopData, MATH_FUNCTION function)
	Performs a Math function forward pass in Cuda. More...

void	math_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, MATH_FUNCTION function)
	Performs a Math function backward pass in Cuda. More...

void	mean_error_loss_bwd (int nCount, long hPredicted, long hTarget, long hBottomDiff, MEAN_ERROR merr)
	Performs a Mean Error Loss backward pass in Cuda. More...

void	mish_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
	Performs a Mish forward pass in Cuda. More...

void	mish_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold, int nMethod=0)
	Performs a Mish backward pass in Cuda. More...

void	gelu_fwd (int nCount, long hBottomData, long hTopData, bool bEnableBertVersion)
	Performs a GELU forward pass in Cuda. More...

void	gelu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, bool bEnableBertVersion)
	Performs a GELU backward pass in Cuda. More...

void	silu_fwd (int nCount, long hBottomData, long hTopData)
	Performs the Sigmoid-weighted Linear Unit (SiLU) activation forward pass in Cuda. More...

void	silu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
	Performs the Sigmoid-weighted Linear Unit (SiLU) activation backward pass in Cuda. More...

void	softplus_fwd (int nCount, long hBottomData, long hTopData)
	Performs the Softplus function forward, a smooth approximation of the ReLU function More...

void	softplus_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
	Performs the Softplus function backward, a smooth approximation of the ReLU function More...

void	lecun_fwd (int nCount, long hBottomData, long hTopData)
	Performs the LeCun's Tanh function forward More...

void	lecun_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData)
	Performs the LeCun's Tanh function backward More...

void	serf_fwd (int nCount, long hBottomData, long hTopData, double dfThreshold)
	Performs a Serf forward pass in Cuda. More...

void	serf_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, long hBottomData, double dfThreshold)
	Performs a Serf backward pass in Cuda. More...

void	tanh_fwd (int nCount, long hBottomData, long hTopData)
	Performs a TanH forward pass in Cuda. More...

void	tanh_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
	Performs a TanH backward pass in Cuda. More...

void	sigmoid_fwd (int nCount, long hBottomData, long hTopData)
	Performs a Sigmoid forward pass in Cuda. More...

void	sigmoid_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff)
	Performs a Sigmoid backward pass in Cuda. More...

void	swish_bwd (int nCount, long hTopDiff, long hTopData, long hSigmoidOutputData, long hBottomDiff, double dfBeta)
	Performs a Swish backward pass in Cuda. More...

void	relu_fwd (int nCount, long hBottomData, long hTopData, T fNegativeSlope)
	Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda. More...

void	relu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomDiff, T fNegativeSlope)
	Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda. More...

void	elu_fwd (int nCount, long hBottomData, long hTopData, double dfAlpha)
	Performs a Exponential Linear Unit (ELU) forward pass in Cuda. More...

void	elu_bwd (int nCount, long hTopDiff, long hTopData, long hBottomData, long hBottomDiff, double dfAlpha)
	Performs a Exponential Linear Unit (ELU) backward pass in Cuda. More...

void	dropout_fwd (int nCount, long hBottomData, long hMask, uint uiThreshold, T fScale, long hTopData)
	Performs a dropout forward pass in Cuda. More...

void	dropout_bwd (int nCount, long hTopDiff, long hMask, uint uiThreshold, T fScale, long hBottomDiff)
	Performs a dropout backward pass in Cuda. More...

void	bnll_fwd (int nCount, long hBottomData, long hTopData)
	Performs a binomial normal log liklihod (BNLL) forward pass in Cuda. More...

void	bnll_bwd (int nCount, long hTopDiff, long hBottomData, long hBottomDiff)
	Performs a binomial normal log liklihod (BNLL) backward pass in Cuda. More...

void	prelu_fwd (int nCount, int nChannels, int nDim, long hBottomData, long hTopData, long hSlopeData, int nDivFactor)
	Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda. More...

void	prelu_bwd_param (int nCDim, int nNum, int nTopOffset, long hTopDiff, long hBottomData, long hBackBuffDiff)
	Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda. More...

void	prelu_bwd (int nCount, int nChannels, int nDim, long hTopDiff, long hBottomData, long hBottomDiff, long hSlopeData, int nDivFactor)
	Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda. More...

void	softmaxloss_fwd (int nCount, long hProbData, long hLabel, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
	Performs Softmax Loss forward pass in Cuda. More...

void	softmaxloss_bwd (int nCount, long hTopData, long hLabel, long hBottomDiff, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
	Performs Softmax Loss backward pass in Cuda. More...

void	nllloss_fwd (int nCount, long hProbData, long hLabel, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
	Performs NLL Loss forward pass in Cuda. More...

void	nllloss_bwd (int nCount, long hTopData, long hLabel, long hBottomDiff, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
	Performs NLL Loss backward pass in Cuda. More...

void	max_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
	Performs a max forward pass in Cuda. More...

void	max_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
	Performs a max backward pass in Cuda. More...

void	min_fwd (int nCount, long hBottomDataA, long hBottomDataB, int nIdx, long hTopData, long hMask)
	Performs a min forward pass in Cuda. More...

void	min_bwd (int nCount, long hTopDiff, int nIdx, long hMask, long hBottomDiff)
	Performs a min backward pass in Cuda. More...

void	crop_fwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomData, long hTopData)
	Performs the crop forward operation. More...

void	crop_bwd (int nCount, int nNumAxes, long hSrcStrides, long hDstStrides, long hOffsets, long hBottomDiff, long hTopDiff)
	Performs the crop backward operation. More...

void	concat_fwd (int nCount, long hBottomData, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hTopData)
	Performs a concat forward pass in Cuda. More...

void	concat_bwd (int nCount, long hTopDiff, int nNumConcats, int nConcatInputSize, int nTopConcatAxis, int nBottomConcatAxis, int nOffsetConcatAxis, long hBottomDiff)
	Performs a concat backward pass in Cuda. More...

void	slice_fwd (int nCount, long hBottomData, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hTopData)
	Performs a slice forward pass in Cuda. More...

void	slice_bwd (int nCount, long hTopDiff, int nNumSlices, int nSliceSize, int nBottomSliceAxis, int nTopSliceAxis, int nOffsetSliceAxis, long hBottomDiff)
	Performs a slice backward pass in Cuda. More...

void	tile_fwd (int nCount, long hBottomData, int nInnerDim, int nTiles, int nBottomTileAxis, long hTopData)
	Performs a tile forward pass in Cuda. More...

void	tile_bwd (int nCount, long hTopDiff, int nTileSize, int nTiles, int nBottomTileAxis, long hBottomDiff)
	Performs a tile backward pass in Cuda. More...

void	bias_fwd (int nCount, long hBottomData, long hBiasData, int nBiasDim, int nInnerDim, long hTopData)
	Performs a bias forward pass in Cuda. More...

void	scale_fwd (int nCount, long hX, long hScaleData, int nScaleDim, int nInnerDim, long hY, long hBiasData=0)
	Performs a scale forward pass in Cuda. More...

void	threshold_fwd (int nCount, double dfThreshold, long hX, long hY)
	Performs a threshold pass in Cuda. More...

void	cll_bwd (int nCount, int nChannels, double dfMargin, bool bLegacyVersion, double dfAlpha, long hY, long hDiff, long hDistSq, long hBottomDiff)
	Performs a contrastive loss layer backward pass in Cuda. More...

void	smoothl1_fwd (int nCount, long hX, long hY)
	Performs the forward operation for the SmoothL1 loss. More...

void	smoothl1_bwd (int nCount, long hX, long hY)
	Performs the backward operation for the SmoothL1 loss. More...

void	permute (int nCount, long hBottom, bool bFwd, long hPermuteOrder, long hOldSteps, long hNewSteps, int nNumAxes, long hTop)
	Performs data permutation on the input and reorders the data which is placed in the output. More...

void	gather_fwd (int nCount, long hBottom, long hTop, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
	Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data. More...

void	gather_bwd (int nCount, long hTop, long hBottom, int nAxis, int nDim, int nDimAtAxis, int nM, int nN, long hIdx)
	Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data. More...

void	lrn_fillscale (int nCount, long hBottomData, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fAlphaOverSize, T fK, long hScaleData)
	Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda. More...

void	lrn_computeoutput (int nCount, long hBottomData, long hScaleData, T fNegativeBeta, long hTopData)
	Computes the output used to calculate the LRN cross channel forward pass in Cuda. More...

void	lrn_computediff (int nCount, long hBottomData, long hTopData, long hScaleData, long hTopDiff, int nNum, int nChannels, int nHeight, int nWidth, int nSize, T fNegativeBeta, T fCacheRatio, long hBottomDiff)
	Computes the diff used to calculate the LRN cross channel backward pass in Cuda. More...

void	sgd_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
	Perform the Stochastic Gradient Descent (SGD) update More...

void	nesterov_update (int nCount, long hNetParamsDiff, long hHistoryData, T fMomentum, T fLocalRate)
	Perform the Nesterov update More...

void	adagrad_update (int nCount, long hNetParamsDiff, long hHistoryData, T fDelta, T fLocalRate)
	Perform the AdaGrad update More...

void	adadelta_update (int nCount, long hNetParamsDiff, long hHistoryData1, long hHistoryData2, T fMomentum, T fDelta, T fLocalRate)
	Perform the AdaDelta update More...

void	adam_update (int nCount, long hNetParamsDiff, long hValM, long hValV, T fBeta1, T fBeta2, T fEpsHat, T fLearningRate, T fCorrection)
	Perform the Adam update More...

void	adamw_update (int nCount, long hNetParamsDiff, long hValM, long hValV, T fBeta1, T fBeta2, T fEpsHat, T fLearningRate, T fDecayRate, long hNetParamsData, int nStep)
	Perform the AdamW update More...

void	rmsprop_update (int nCount, long hNetParamsDiff, long hHistoryData, T fRmsDecay, T fDelta, T fLocalRate)
	Perform the RMSProp update More...

void	lstm_fwd (int t, int nN, int nH, int nI, long hWeight_h, long hWeight_i, long hClipData, int nClipOffset, long hTopData, int nTopOffset, long hCellData, int nCellOffset, long hPreGateData, int nPreGateOffset, long hGateData, int nGateOffset, long hHT1Data, int nHT1Offset, long hCT1Data, int nCT1Offset, long hHtoGateData, long hContext=0, long hWeight_c=0, long hCtoGetData=0)
	Peforms the simple LSTM foward pass in Cuda. More...

void	lstm_bwd (int t, int nN, int nH, int nI, double dfClippingThreshold, long hWeight_h, long hClipData, int nClipOffset, long hTopDiff, int nTopOffset, long hCellData, long hCellDiff, int nCellOffset, long hPreGateDiff, int nPreGateOffset, long hGateData, long hGateDiff, int nGateOffset, long hCT1Data, int nCT1Offset, long hDHT1Diff, int nDHT1Offset, long hDCT1Diff, int nDCT1Offset, long hHtoHData, long hContextDiff=0, long hWeight_c=0)
	Peforms the simple LSTM backward pass in Cuda. More...

void	lstm_unit_fwd (int nCount, int nHiddenDim, int nXCount, long hX, long hX_acts, long hC_prev, long hCont, long hC, long hH)
	Peforms the simple LSTM foward pass in Cuda for a given LSTM unit. More...

void	lstm_unit_bwd (int nCount, int nHiddenDim, int nXCount, long hC_prev, long hX_acts, long hC, long hH, long hCont, long hC_diff, long hH_diff, long hC_prev_diff, long hX_acts_diff, long hX_diff)
	Peforms the simple LSTM backward pass in Cuda for a given LSTM unit. More...

void	coeff_sum_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
	Performs a coefficient sum foward pass in Cuda. More...

void	coeff_sum_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
	Performs a coefficient sum backward pass in Cuda. More...

void	coeff_sub_fwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hBottom, long hTop)
	Performs a coefficient sub foward pass in Cuda. More...

void	coeff_sub_bwd (int nCount, int nDim, int nNumOffset, double dfCoeff, long hCoeffData, long hTopDiff, long hBottomDiff)
	Performs a coefficient sub backward pass in Cuda. More...

void	sigmoid_cross_entropy_fwd (int nCount, long hInput, long hTarget, long hLoss, bool bHasIgnoreLabel, int nIgnoreLabel, long hCountData)
	Performs a sigmoid cross entropy forward pass in Cuda. More...

void	sigmoid_cross_entropy_bwd (int nCount, int nIgnoreLabel, long hTarget, long hBottomDiff)
	Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified. More...

void	softmax_cross_entropy_fwd (int nCount, long hProbData, long hLabel, long hLossDiff, long hLossData, int nOuterNum, int nDim, int nInnerNum, long hCounts, int? nIgnoreLabel)
	Performs a softmax cross entropy forward pass in Cuda. More...

void	softmax_cross_entropy_bwd (int nCount, int nIgnoreLabel, long hTarget, long hBottomDiff)
	Performs a softmax cross entropy backward pass in Cuda when an ignore label is specified. More...

void	debug ()
	The debug function is uses only during debugging the debug version of the low-level DLL. More...

void	matrix_meancenter_by_column (int nWidth, int nHeight, long hA, long hB, long hY, bool bNormalize=false)
	Mean center the data by columns, where each column is summed and then subtracted from each column value. More...

void	gaussian_blur (int n, int nChannels, int nHeight, int nWidth, double dfSigma, long hX, long hY)
	The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma. More...

double	hamming_distance (int n, double dfThreshold, long hA, long hB, long hY, int nOffA=0, int nOffB=0, int nOffY=0)
	The hamming_distance calculates the Hamming Distance between X and Y both of length n. More...

void	calc_dft_coefficients (int n, long hX, int m, long hY)
	Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY. More...

double[]	calculate_batch_distances (DistanceMethod distMethod, double dfThreshold, int nItemDim, long hSrc, long hTargets, long hWork, int[,] rgOffsets)
	The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified. More...

void	ReportMemory (Log log, string strLocation)
	Report the memory use on the current GPU managed by the CudaDnn object. More...

Static Public Member Functions
static string	GetCudaDnnDllPath ()
	Returns the path to the CudaDnnDll module to use for low level CUDA processing. More...

static void	SetDefaultCudaPath (string strPath)
	Used to optionally set the default path to the Low-Level Cuda Dnn DLL file. More...

static ulong	basetype_size (bool bUseHalfSize)
	Returns the base type size in bytes. More...

static ulong	ConvertByteSizeToCount (ulong ulSizeInBytes)
	Converts the byte size into the number of items in the base data type of float or double. More...

Protected Member Functions
virtual void	Dispose (bool bDisposing)
	Disposes this instance freeing up all of its host and GPU memory. More...

Properties
ulong	TotalMemoryUsed `[get]`
	Returns the total amount of GPU memory used by this instance. More...

string	TotalMemoryUsedAsText `[get]`
	Returns the total amount of memory used. More...

long	KernelHandle `[get]`
	Returns the Low-Level kernel handle used for this instance. Each Low-Level kernel maintains its own set of look-up tables for memory, streams, cuDnn constructs, etc. More...

string	Path `[get]`
	Specifies the file path used to load the Low-Level Cuda DNN Dll file. More...

static string	DefaultPath `[get]`
	Specifies the default path used t load the Low-Level Cuda DNN Dll file. More...

int	OriginalDeviceID `[get]`
	Returns the original device ID used to create the instance of CudaDnn. More...

static ulong	BaseSize `[get]`
	Returns the base data type size (e.g. float= 4, double = 8). More...

Detailed Description

The CudaDnn object is the main interface to the Low-Level Cuda C++ DLL.

This is the transition location where C# meets C++.

Template Parameters

T	Specifies the base type float or double. Using float is recommended to conserve GPU memory.

Definition at line 968 of file CudaDnn.cs.

Constructor & Destructor Documentation

◆ CudaDnn() [1/2]

MyCaffe.common.CudaDnn< T >.CudaDnn	(	int	nDeviceID,
		DEVINIT	flags = `(DEVINIT.CUBLAS \| DEVINIT.CURAND)`,
		long?	lSeed = `null`,
		string	strPath = `""`,
		bool	bResetFirst = `false`,
		bool	bEnableMemoryTrace = `false`
	)

The CudaDnn constructor.

Parameters

nDeviceID	Specifies the zero-based device (GPU) id. Note, if there are 5 GPU's in the system, the device ID's will be numbered 0, 1, 2, 3, 4.
flags	Specifies the flags under which to initialize the Low-Level Cuda system.
lSeed	Optionally specifies the random number generator seed. Typically this is only used during testing.
strPath	Specifies the file path of the Low-Level Cuda DNN Dll file. When NULL or empty, the Low-Level CudaDNNDll.dll file in the directory of the currently executing process (that is using the CudaDnn object) is used.
bResetFirst	Specifies to reset the device before initialzing. IMPORTANT: It is only recommended to set this to true when testing.
bEnableMemoryTrace	Optionally, specifies to enable the memory tracing (only supported in debug mode and dramatically slows down processing).

Definition at line 1488 of file CudaDnn.cs.

◆ CudaDnn() [2/2]

MyCaffe.common.CudaDnn< T >.CudaDnn	(	CudaDnn< T >	cuda,
		bool	bEnableGhostMemory
	)

Alternate CudaDnn constructor.

Parameters

cuda	Specifies an already created CudaDn instance. The internal Cuda Control of this instance is used by the new instance.
bEnableGhostMemory	Specifies to enable the ghost memory used to estimate GPU memory usage without allocating any GPU memory.

Definition at line 1587 of file CudaDnn.cs.

Member Function Documentation

◆ abs()

void MyCaffe.common.CudaDnn< T >.abs	(	int	n,
		long	hA,
		long	hY
	)

Calculates the absolute value of A and places the result in Y.

Y = abs(X)

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7437 of file CudaDnn.cs.

◆ accuracy_fwd()

void MyCaffe.common.CudaDnn< T >.accuracy_fwd	(	int	nCount,
		int	nOuterNum,
		int	nInnerNum,
		long	hBottomData,
		long	hBottomLabel,
		long	hAccData,
		long	hAccTotals,
		int?	nIgnoreLabel,
		bool	bLastElementOnly,
		int	nBatch
	)

Performs the forward pass for the accuracy layer

Parameters

nCount	Specifies the number of items.
nOuterNum	Specifies the outer count.
nInnerNum	Specifies the inner count.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBottomLabel	Specifies a handle to the bottom labels in GPU memory.
hAccData	Specifies a handle to temporary accuracy correct items in GPU memory.
hAccTotals	Specifies a handle to the temporary accuracy totals in GPU memory.
nIgnoreLabel	Optionally, specifies a label to igore.
bLastElementOnly	Optionally specifies to only test the last element in each set.
nBatch	Optionally specifies the batch size.

Definition at line 8700 of file CudaDnn.cs.

◆ adadelta_update()

void MyCaffe.common.CudaDnn< T >.adadelta_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hHistoryData1,
		long	hHistoryData2,
		T	fMomentum,
		T	fDelta,
		T	fLocalRate
	)

Perform the AdaDelta update

See ADADELTA: An Adaptive Learning Rate Method by Zeiler, 2012

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hHistoryData1	Specifies a handle to history data in GPU memory.
hHistoryData2	Specifies a handle to history data in GPU memory.
fMomentum	Specifies the momentum to use.
fDelta	Specifies the numerical stability factor.
fLocalRate	Specifies the local learning rate.

Definition at line 10264 of file CudaDnn.cs.

◆ adagrad_update()

void MyCaffe.common.CudaDnn< T >.adagrad_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hHistoryData,
		T	fDelta,
		T	fLocalRate
	)

Perform the AdaGrad update

See Adaptive Subgradient Methods for Online Learning and Stochastic Optimization by Duchi, et al., 2011

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hHistoryData	Specifies a handle to the history data in GPU memory.
fDelta	Specifies the numerical stability factor.
fLocalRate	Specifies the local learning rate.

Definition at line 10243 of file CudaDnn.cs.

◆ adam_update()

void MyCaffe.common.CudaDnn< T >.adam_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hValM,
		long	hValV,
		T	fBeta1,
		T	fBeta2,
		T	fEpsHat,
		T	fLearningRate,
		T	fCorrection
	)

Perform the Adam update

See Adam: A Method for Stochastic Optimization by Kingma, et al., 2014

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hValM	*First moment.*
hValV	*Second moment.*
fBeta1	*Momentum for first moment.*
fBeta2	*Momentum for second moment.*
fEpsHat	*Small value used to avoid Nan.*
fLearningRate	*Learning rate.*
fCorrection	Correction where Local Learning Rate = 'fCorrection' * 'fLearningRate'

Definition at line 10287 of file CudaDnn.cs.

◆ adamw_update()

void MyCaffe.common.CudaDnn< T >.adamw_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hValM,
		long	hValV,
		T	fBeta1,
		T	fBeta2,
		T	fEpsHat,
		T	fLearningRate,
		T	fDecayRate,
		long	hNetParamsData,
		int	nStep
	)

Perform the AdamW update

See also: Decoupled Weight Decay Regularization by Loshchilov, I. and Hutter, F., 2019. See Adam: A Method for Stochastic Optimization by Kingma, et al., 2014

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hValM	*First moment.*
hValV	*Second moment.*
fBeta1	*Momentum for first moment.*
fBeta2	*Momentum for second moment.*
fEpsHat	*Small value used to avoid Nan.*
fLearningRate	*Learning rate.*
fDecayRate	Optionally, enable detached weight decay for AdamW optimization using this decay rate (when 0, Adam update is used).
hNetParamsData	Optionally, specifies the net params weight data (used when fDecayRate != 0)
nStep	Optionally, specifies the current step - used with AdamW optimization updates.

Definition at line 10313 of file CudaDnn.cs.

◆ add() [1/5]

void MyCaffe.common.CudaDnn< T >.add	(	int	n,
		long	hA,
		long	hB,
		long	hC,
		long	hY
	)

Adds A, B and C and places the result in Y.

Y = A + B + C

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hC	Specifies a handle to the vector C in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7209 of file CudaDnn.cs.

◆ add() [2/5]

void MyCaffe.common.CudaDnn< T >.add	(	int	n,
		long	hA,
		long	hB,
		long	hY
	)

Adds A to B and places the result in Y.

Y = A + B

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7227 of file CudaDnn.cs.

◆ add() [3/5]

void MyCaffe.common.CudaDnn< T >.add	(	int	n,
		long	hA,
		long	hB,
		long	hY,
		double	dfAlpha
	)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dfAlpha	Specifies a scalar int type double

Definition at line 7246 of file CudaDnn.cs.

◆ add() [4/5]

void MyCaffe.common.CudaDnn< T >.add	(	int	n,
		long	hA,
		long	hB,
		long	hY,
		double	dfAlphaA,
		double	dfAlphaB,
		int	nAOff = `0`,
		int	nBOff = `0`,
		int	nYOff = `0`
	)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dfAlphaA	Specifies a scalar int type 'T' applied to A.
dfAlphaB	Specifies a scalar int type 'T' applied to B.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A.
nBOff	Optionally, specifies an offset (in items, not bytes) into the memory of B.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7288 of file CudaDnn.cs.

◆ add() [5/5]

void MyCaffe.common.CudaDnn< T >.add	(	int	n,
		long	hA,
		long	hB,
		long	hY,
		float	fAlpha
	)

Adds A to (B times scalar) and places the result in Y.

Y = A + (B * alpha)

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
fAlpha	Specifies a scalar int type float

Definition at line 7265 of file CudaDnn.cs.

◆ add_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.add_scalar	(	int	n,
		double	fAlpha,
		long	hY
	)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters

n	Specifies the number of items (not bytes) in the vector Y.
fAlpha	Specifies the scalar value in type double
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7161 of file CudaDnn.cs.

◆ add_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.add_scalar	(	int	n,
		float	fAlpha,
		long	hY
	)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters

n	Specifies the number of items (not bytes) in the vector Y.
fAlpha	Specifies the scalar value in type float
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7175 of file CudaDnn.cs.

◆ add_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.add_scalar	(	int	n,
		T	fAlpha,
		long	hY,
		int	nYOff = `0`
	)

Adds a scalar value to each element of Y.

Y = Y + alpha

Parameters

n	Specifies the number of items (not bytes) in the vector Y.
fAlpha	Specifies the scalar value in type 'T'.
hY	Specifies a handle to the vector Y in GPU memory.
nYOff	Optionally, specifies an offset into Y. The default is 0.

Definition at line 7190 of file CudaDnn.cs.

◆ AddTensor() [1/2]

void MyCaffe.common.CudaDnn< T >.AddTensor	(	long	hCuDnn,
		long	hSrcDesc,
		long	hSrc,
		int	nSrcOffset,
		long	hDstDesc,
		long	hDst,
		int	nDstOffset
	)

Add two tensors together.

Parameters

hCuDnn	Specifies a handle to the cuDnn instance.
hSrcDesc	Specifies a handle to the source tensor descriptor.
hSrc	Specifies a handle to the source GPU memory.
nSrcOffset	Specifies an offset within the GPU memory.
hDstDesc	Specifies a handle to the destination tensor descriptor.
hDst	Specifies a handle to the desination GPU memory.
nDstOffset	Specifies an offset within the GPU memory.

Definition at line 3638 of file CudaDnn.cs.

◆ AddTensor() [2/2]

void MyCaffe.common.CudaDnn< T >.AddTensor	(	long	hCuDnn,
		T	fAlpha,
		long	hSrcDesc,
		long	hSrc,
		int	nSrcOffset,
		T	fBeta,
		long	hDstDesc,
		long	hDst,
		int	nDstOffset
	)

Add two tensors together.

Parameters

hCuDnn	Specifies a handle to the cuDnn instance.
fAlpha	Specifies a scaling factor applied to the source GPU memory before the add.
hSrcDesc	Specifies a handle to the source tensor descriptor.
hSrc	Specifies a handle to the source GPU memory.
nSrcOffset	Specifies an offset within the GPU memory.
fBeta	Specifies a scaling factor applied to the destination GPU memory before the add.
hDstDesc	Specifies a handle to the destination tensor descriptor.
hDst	Specifies a handle to the desination GPU memory.
nDstOffset	Specifies an offset within the GPU memory.

Definition at line 3655 of file CudaDnn.cs.

◆ AllocHostBuffer()

long MyCaffe.common.CudaDnn< T >.AllocHostBuffer ( long lCapacity )

Allocate a block of host memory with a specified capacity.

Parameters

lCapacity Specifies the capacity to allocate (in items, not bytes).

Returns: The handle to the host memory is returned.

Definition at line 2581 of file CudaDnn.cs.

◆ AllocMemory() [1/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory	(	double[]	rgSrc,
		long	hStream = `0`
	)

Allocate a block of GPU memory and copy an array of doubles to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

rgSrc	Specifies an array of doubles to copy to the GPU.
hStream	Optionally specifies a stream to use for the copy.

Returns: The handle to the GPU memory is returned.

Definition at line 2314 of file CudaDnn.cs.

◆ AllocMemory() [2/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory	(	float[]	rgSrc,
		long	hStream = `0`
	)

Allocate a block of GPU memory and copy an array of float to it, optionally using a stream for the copy.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

rgSrc	Specifies an array of float to copy to the GPU.
hStream	Optionally specifies a stream to use for the copy.

Returns: The handle to the GPU memory is returned.

Definition at line 2326 of file CudaDnn.cs.

◆ AllocMemory() [3/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< double > rg )

Allocate a block of GPU memory and copy a list of doubles to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

rg	Specifies a list of doubles to copy to the GPU.

Returns: The handle to the GPU memory is returned.

Definition at line 2291 of file CudaDnn.cs.

◆ AllocMemory() [4/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory ( List< float > rg )

Allocate a block of GPU memory and copy a list of floats to it.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

rg	Specifies a list of floats to copy to the GPU.

Returns: The handle to the GPU memory is returned.

Definition at line 2302 of file CudaDnn.cs.

◆ AllocMemory() [5/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory	(	long	lCapacity,
		bool	bHalfSize = `false`
	)

Allocate a block of GPU memory with a specified capacity.

Parameters

lCapacity	Specifies the capacity to allocate (in items, not bytes).
bHalfSize	Optionally, specifies to use half size float memory - only available with the 'float' base type.

Returns: The handle to the GPU memory is returned.

Definition at line 2449 of file CudaDnn.cs.

◆ AllocMemory() [6/6]

long MyCaffe.common.CudaDnn< T >.AllocMemory	(	T[]	rgSrc,
		long	hStream = `0`,
		bool	bHalfSize = `false`
	)

Allocate a block of GPU memory and copy an array of type 'T' to it, optionally using a stream for the copy.

Parameters

rgSrc	Specifies an array of 'T' to copy to the GPU.
hStream	Optionally, specifies a stream to use for the copy.
bHalfSize	Optionally, specifies to use half size float memory - only available with the 'float' base type.

Returns: The handle to the GPU memory is returned.

Definition at line 2338 of file CudaDnn.cs.

◆ AllocPCAData()

long MyCaffe.common.CudaDnn< T >.AllocPCAData	(	int	nM,
		int	nN,
		int	nK,
		out int	nCount
	)

Allocates the GPU memory for the PCA Data.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

nM	Specifies the data width (number of rows).
nN	Specifies the data height (number of columns).
nK	Specifies the number of components (K <= N).
nCount	Returns the total number of items in the allocated data (nM * nN).

Returns

Definition at line 5319 of file CudaDnn.cs.

◆ AllocPCAEigenvalues()

long MyCaffe.common.CudaDnn< T >.AllocPCAEigenvalues	(	int	nM,
		int	nN,
		int	nK,
		out int	nCount
	)

Allocates the GPU memory for the PCA eigenvalues.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

nM	Specifies the data width (number of rows).
nN	Specifies the data height (number of columns).
nK	Specifies the number of components (K <= N).
nCount	Returns the total number of items in the allocated data (nM * nN).

Returns

Definition at line 5370 of file CudaDnn.cs.

◆ AllocPCALoads()

long MyCaffe.common.CudaDnn< T >.AllocPCALoads	(	int	nM,
		int	nN,
		int	nK,
		out int	nCount
	)

Allocates the GPU memory for the PCA loads.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

nM	Specifies the data width (number of rows).
nN	Specifies the data height (number of columns).
nK	Specifies the number of components (K <= N).
nCount	Returns the total number of items in the allocated data (nM * nN).

Returns

Definition at line 5353 of file CudaDnn.cs.

◆ AllocPCAScores()

long MyCaffe.common.CudaDnn< T >.AllocPCAScores	(	int	nM,
		int	nN,
		int	nK,
		out int	nCount
	)

Allocates the GPU memory for the PCA scores.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

nM	Specifies the data width (number of rows).
nN	Specifies the data height (number of columns).
nK	Specifies the number of components (K <= N).
nCount	Returns the total number of items in the allocated data (nM * nN).

Returns

Definition at line 5336 of file CudaDnn.cs.

◆ asum()

T MyCaffe.common.CudaDnn< T >.asum	(	int	n,
		long	hX,
		int	nXOff = `0`
	)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.

Returns: the absolute value sum is returned as a type 'T'.

Definition at line 6901 of file CudaDnn.cs.

◆ asum_double()

double MyCaffe.common.CudaDnn< T >.asum_double	(	int	n,
		long	hX,
		int	nXOff = `0`
	)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.

Returns: the absolute sum is returned as a type
double

Definition at line 6871 of file CudaDnn.cs.

◆ asum_float()

float MyCaffe.common.CudaDnn< T >.asum_float	(	int	n,
		long	hX,
		int	nXOff = `0`
	)

Computes the sum of absolute values in X.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.

Returns: the absolute sum is returned as a type
float

Definition at line 6886 of file CudaDnn.cs.

◆ axpby() [1/3]

void MyCaffe.common.CudaDnn< T >.axpby	(	int	n,
		double	fAlpha,
		long	hX,
		double	fBeta,
		long	hY
	)

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type double
hX	Specifies a handle to the vector X in GPU memory.
fBeta	Specifies the scaling factor to apply to vector X, where the scaling factor is of type double
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6595 of file CudaDnn.cs.

◆ axpby() [2/3]

void MyCaffe.common.CudaDnn< T >.axpby	(	int	n,
		float	fAlpha,
		long	hX,
		float	fBeta,
		long	hY
	)

Scale the vector x and then multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type float
hX	Specifies a handle to the vector X in GPU memory.
fBeta	Specifies the scaling factor to apply to vector X, where the scaling factor is of type float
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6611 of file CudaDnn.cs.

◆ axpby() [3/3]

void MyCaffe.common.CudaDnn< T >.axpby	(	int	n,
		T	fAlpha,
		long	hX,
		T	fBeta,
		long	hY
	)

Scale the vector x by Alpha and scale vector y by Beta and then add both together.

Y = (X * fAlpha) + (Y * fBeta)

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type 'T'.
hX	Specifies a handle to the vector X in GPU memory.
fBeta	Specifies the scaling factor to apply to vector X, where the scaling factor is of type 'T'.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6629 of file CudaDnn.cs.

◆ axpy() [1/3]

void MyCaffe.common.CudaDnn< T >.axpy	(	int	n,
		double	fAlpha,
		long	hX,
		long	hY
	)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type double
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6544 of file CudaDnn.cs.

◆ axpy() [2/3]

void MyCaffe.common.CudaDnn< T >.axpy	(	int	n,
		float	fAlpha,
		long	hX,
		long	hY
	)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type float
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6559 of file CudaDnn.cs.

◆ axpy() [3/3]

void MyCaffe.common.CudaDnn< T >.axpy	(	int	n,
		T	fAlpha,
		long	hX,
		long	hY,
		int	nXOff = `0`,
		int	nYOff = `0`
	)

Multiply the vector X by a scalar and add the result to the vector Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scalar to multiply where the scalar is of type 'T'.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6576 of file CudaDnn.cs.

◆ basetype_size()

static ulong MyCaffe.common.CudaDnn< T >.basetype_size ( bool bUseHalfSize )

static

Returns the base type size in bytes.

Parameters

bUseHalfSize Specifies whether or not to use half size or the base size.

Definition at line 1899 of file CudaDnn.cs.

◆ BatchNormBackward()

void MyCaffe.common.CudaDnn< T >.BatchNormBackward	(	long	hCuDnn,
		BATCHNORM_MODE	mode,
		T	fAlphaDiff,
		T	fBetaDiff,
		T	fAlphaParamDiff,
		T	fBetaParamDiff,
		long	hBwdBottomDesc,
		long	hBottomData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDiffDesc,
		long	hBottomDiff,
		long	hBwdScaleBiasMeanVarDesc,
		long	hScaleData,
		long	hScaleDiff,
		long	hBiasDiff,
		double	dfEps,
		long	hSaveMean,
		long	hSaveInvVar
	)

Run the batch norm backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
mode	Specifies the batch normalization mode.
fAlphaDiff	Specifies the alpha value applied to the diff.
fBetaDiff	Specifies the beta value applied to the diff.
fAlphaParamDiff	Specifies the alpha value applied to the param diff.
fBetaParamDiff	Specifies the beta value applied to the param diff.
hBwdBottomDesc	Specifies a handle to the backward bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data tensor.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor.
hTopDiff	Specifies a handle to the top diff tensor.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff tensor.
hBwdScaleBiasMeanVarDesc	Specifies a handle to the backward scale bias mean var descriptor.
hScaleData	Specifies a handle to the scale data tensor.
hScaleDiff	Specifies a handle to the scale diff tensor.
hBiasDiff	Specifies a handle to the bias diff tensor.
dfEps	Specifies the epsilon value.
hSaveMean	Specifies a handle to the saved mean tensor.
hSaveInvVar	Specifies a handle to the saved variance tensor.

Definition at line 4191 of file CudaDnn.cs.

◆ BatchNormForward()

void MyCaffe.common.CudaDnn< T >.BatchNormForward	(	long	hCuDnn,
		BATCHNORM_MODE	mode,
		T	fAlpha,
		T	fBeta,
		long	hFwdBottomDesc,
		long	hBottomData,
		long	hFwdTopDesc,
		long	hTopData,
		long	hFwdScaleBiasMeanVarDesc,
		long	hScaleData,
		long	hBiasData,
		double	dfFactor,
		long	hGlobalMean,
		long	hGlobalVar,
		double	dfEps,
		long	hSaveMean,
		long	hSaveInvVar,
		bool	bTraining
	)

Run the batch norm forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
mode	Specifies the batch normalization mode.
fAlpha	Specifies the alpha value.
fBeta	Specifies the beta value.
hFwdBottomDesc	Specifies a handle to the forward bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data tensor.
hFwdTopDesc	Specifies a handle to the forward top tensor descriptor.
hTopData	Specifies a handle to the top tensor.
hFwdScaleBiasMeanVarDesc	Specifies a handle to the forward scale bias mean variance descriptor.
hScaleData	Specifies a handle to the scale tensor.
hBiasData	Specifies a handle to the bias tensor.
dfFactor	Specifies a scaling factor.
hGlobalMean	Specifies a handle to the global mean tensor.
hGlobalVar	Specifies a handle to the global variance tensor.
dfEps	Specifies the epsilon value to avoid dividing by zero.
hSaveMean	Specifies a handle to the saved mean tensor.
hSaveInvVar	Specifies a handle to the saved variance tensor.
bTraining	Specifies that this is a training pass when true, and a testing pass when false.

Definition at line 4161 of file CudaDnn.cs.

◆ batchreidx_bwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_bwd	(	int	nCount,
		int	nInnerDim,
		long	hTopDiff,
		long	hTopIdx,
		long	hBegins,
		long	hCounts,
		long	hBottomDiff
	)

Performs the backward pass for batch re-index

Parameters

nCount	Specifies the number of items.
nInnerDim	Specifies the inner dimension.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopIdx	Specifies a handle to the top indexes in GPU memory.
hBegins	Specifies a handle to the begin data in GPU memory.
hCounts	Specifies a handle to the counts in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 8745 of file CudaDnn.cs.

◆ batchreidx_fwd()

void MyCaffe.common.CudaDnn< T >.batchreidx_fwd	(	int	nCount,
		int	nInnerDim,
		long	hBottomData,
		long	hPermutData,
		long	hTopData
	)

Performs the forward pass for batch re-index

Parameters

nCount	Specifies the number of items.
nInnerDim	Specifies the inner dimension.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hPermutData	Specifies a handle to the permuation data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 8727 of file CudaDnn.cs.

◆ bias_fwd()

void MyCaffe.common.CudaDnn< T >.bias_fwd	(	int	nCount,
		long	hBottomData,
		long	hBiasData,
		int	nBiasDim,
		int	nInnerDim,
		long	hTopData
	)

Performs a bias forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
hBiasData	Specifies a handle to the bias data in GPU memory.
nBiasDim	Specifies the bias dimension.
nInnerDim	*NEEDS REVIEW*
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9958 of file CudaDnn.cs.

◆ bnll_bwd()

void MyCaffe.common.CudaDnn< T >.bnll_bwd	(	int	nCount,
		long	hTopDiff,
		long	hBottomData,
		long	hBottomDiff
	)

Performs a binomial normal log liklihod (BNLL) backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9516 of file CudaDnn.cs.

◆ bnll_fwd()

void MyCaffe.common.CudaDnn< T >.bnll_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs a binomial normal log liklihod (BNLL) forward pass in Cuda.

Computes

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9501 of file CudaDnn.cs.

◆ calc_dft_coefficients()

void MyCaffe.common.CudaDnn< T >.calc_dft_coefficients	(	int	n,
		long	hX,
		int	m,
		long	hY
	)

Calculates the discrete Fourier Transform (DFT) coefficients across the frequencies 1...n/2 (Nyquest Limit) for the array of values in host memory referred to by hA. Return values are placed in the host memory referenced by hY.

Parameters

n	Specifies the number of items.
hX	Specifies a handle to the host memory holding the input values.
m	Specifies the number of items in hY, must = n/2 (Nyquest Limit)
hY	Specifies a handle to the host memory holding the n/2 output values (Nyquest Limit)

See also: Implement the Spectrogram from scratch in python by Yumi, Yumi's Blog, 2018

Definition at line 11027 of file CudaDnn.cs.

◆ calculate_batch_distances()

double[] MyCaffe.common.CudaDnn< T >.calculate_batch_distances	(	DistanceMethod	distMethod,
		double	dfThreshold,
		int	nItemDim,
		long	hSrc,
		long	hTargets,
		long	hWork,
		int	rgOffsets[,]
	)

The calculate_batch_distances method calculates a set of distances based on the DistanceMethod specified.

Parameters

distMethod	Specifies the DistanceMethod to use (i.e. HAMMING or EUCLIDEAN).
dfThreshold	Specifies the threshold used when binarifying the values for the HAMMING distance. This parameter is ignored when calculating the EUCLIDEAN distance.
nItemDim	Specifies the dimension of a single item.
hSrc	Specifies the GPU memory containing the source items.
hTargets	Specifies the GPU memory containing the target items that are compared against the source items.
hWork	Specifies the GPU memory containing the work memory - this must be the same size as the maximum size of the src or targets.
rgOffsets	Specifies the array of offset pairs where the first offset is into the source and the second is into the target.

Returns: The array distances corresponding to each offset pair is returned.

Definition at line 11046 of file CudaDnn.cs.

◆ channel_add()

void MyCaffe.common.CudaDnn< T >.channel_add	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nBlocks,
		int	nInnerNum,
		int	nOffset,
		long	hX,
		long	hY,
		DIR	dir
	)

Add data along channels similar to numpy split function but where the data is added instead of copied.

Parameters

nCount	Specifies the total number of elements in Y which = count(X)/nBlocks in length.
nOuterNum	Specifies the number of items.
nChannels	Specifies the number of channels.
nBlocks	Specifies the number of blocks in each channel.
nInnerNum	Specifies the dimension of each inner dim within the channel.
nOffset	Specifies the offset of the inner dim.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dir	Specifies the direction of data flow (0 = fwd X->Y, 1 = bwd Y->X).

Definition at line 8437 of file CudaDnn.cs.

◆ channel_compare()

void MyCaffe.common.CudaDnn< T >.channel_compare	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Compares the values of the channels from X and places the result in Y where 1 is set if the values are equal otherwise 0 is set.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory of length nOuterNum.

Definition at line 8133 of file CudaDnn.cs.

◆ channel_copy()

void MyCaffe.common.CudaDnn< T >.channel_copy	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nBlocks,
		int	nInnerNum,
		int	nOffset,
		long	hX,
		long	hY,
		DIR	dir
	)

Copy data along channels similar to numpy split function.

Parameters

nCount	Specifies the total number of elements in Y which = count(X)/nBlocks in length.
nOuterNum	Specifies the number of items.
nChannels	Specifies the number of channels.
nBlocks	Specifies the number of blocks in each channel.
nInnerNum	Specifies the dimension of each inner dim within the channel.
nOffset	Specifies the offset of the inner dim.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dir	Specifies the direction of data flow (0 = fwd X->Y, 1 = bwd Y->X).

Definition at line 8457 of file CudaDnn.cs.

◆ channel_copyall()

void MyCaffe.common.CudaDnn< T >.channel_copyall	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Copy all data from X (shape 1,c,sd) to each num in Y (shape n,c,sd).

Parameters

nCount	Specifies the full count of Y.
nOuterNum	Specifies the outer num of Y.
nChannels	Specifies the channels in X and Y.
nInnerNum	Specifies the spatial dimension of X and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8474 of file CudaDnn.cs.

◆ channel_div()

void MyCaffe.common.CudaDnn< T >.channel_div	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		int	nMethod = `1`
	)

Divides the values of the channels from X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nMethod	Specifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 8254 of file CudaDnn.cs.

◆ channel_dot()

void MyCaffe.common.CudaDnn< T >.channel_dot	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hA,
		long	hY
	)

Calculates the dot product the the values within each channel of X and places the result in Y.

Parameters

nCount	Specifies the number of elements.
nOuterNum	Specifies the number of images.
nChannels	Specifies the number of channels per image.
nInnerNum	Specifies the dimension of each image.
hX	Specifies a handle to the vector X in GPU memory.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8326 of file CudaDnn.cs.

◆ channel_duplicate()

void MyCaffe.common.CudaDnn< T >.channel_duplicate	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Duplicates each channel 'nInnerNum' of times in the destination.

Parameters

nCount	Specifies the total number of elements in Y which = count(X)*nInnerDim in length.
nOuterNum	Specifies the number of items.
nChannels	Specifies the number of channels.
nInnerNum	Specifies the dimension of each inner dim within the channel.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8343 of file CudaDnn.cs.

◆ channel_fill()

void MyCaffe.common.CudaDnn< T >.channel_fill	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		int	nLabelDim,
		long	hLabels,
		long	hY
	)

Fills each channel with the channel item of Y with the data of X matching the label index specified by hLabels.

Parameters

nCount	Specifies the number of items in Y.
nOuterNum	Specifies the num of Y and Labels.
nChannels	Specifies the channel size of Y and X.
nInnerNum	Specifies the spatial dimension of X and Y, but is normally 1.
hX	Specifies the GPU memory containing the encodings (usually centroids) of each label 0, ... max label.
nLabelDim	Specifies the dimension of the label channels. A value > 1 indicates that more than one label are stored per channel in which case only the first label is used.
hLabels	Specifies the label ordering that determines how Y is filled using data from X.
hY	Specifies the GPU memory of the output data.

This function is used to fill a blob with data matching a set of labels. For example in a 3 item encoding based system with 4 labels: X = 4 channels of 3 items each (e.g. an encoding for each label). The values of hLabels show the ordering for which to fill hY with the labeled encodings. So if hLabels = 0, 2, 1, 3, 1, then Y = size { 5, 3, 1, 1 }, 5 items each with encoding sizes of 3 items which are then filled with the encoding at position 0, (for label 0), followed by the encoding for label 2, then 1, 3 and ending with the encoding for 1 as specified by the labels.

Definition at line 8179 of file CudaDnn.cs.

◆ channel_fillfrom()

void MyCaffe.common.CudaDnn< T >.channel_fillfrom	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		DIR	dir
	)

Fills each channel with the the values stored in Src data where the X data continains nOuterNum x nChannels of data, (e.g. one item per channel) that is then copied to all nInnerNum elements of each channel in Y

Parameters

nCount	Specifies the number of items in Y.
nOuterNum	Specifies the num of Y and Labels.
nChannels	Specifies the channel size of Y and X.
nInnerNum	Specifies the spatial dimension of X and Y, but is normally 1.
hX	Specifies the GPU memory containing the src data of shape (nOuterNum, nChannels, 1).
hY	Specifies the GPU memory of the output data where the X src data is copied where each item per channel is filled across all nInnerNum elements of Y. Y should have shape (nOuterNum, nChannels, nInnerNum).
dir	Specifies the direction of data flow. When FWD X->Y, when BWD Y->X

Definition at line 8152 of file CudaDnn.cs.

◆ channel_max()

void MyCaffe.common.CudaDnn< T >.channel_max	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		bool	bReturnIdx = `false`
	)

Calculates the maximum value within each channel of X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
bReturnIdx	Optionally, specifies to return the index of the maximum value, otherwise the maximum value is returned.

Definition at line 8099 of file CudaDnn.cs.

◆ channel_mean()

void MyCaffe.common.CudaDnn< T >.channel_mean	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Calculates the mean value of each channel of X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8116 of file CudaDnn.cs.

◆ channel_min()

void MyCaffe.common.CudaDnn< T >.channel_min	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		bool	bReturnIdx = `false`
	)

Calculates the minimum value within each channel of X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
bReturnIdx	Optionally, specifies to return the index of the minimum value, otherwise the minimum value is returned.

Definition at line 8081 of file CudaDnn.cs.

◆ channel_mul()

void MyCaffe.common.CudaDnn< T >.channel_mul	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		int	nMethod = `1`
	)

Multiplies the values of the channels from X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nMethod	Specifies the method of traversing the channel, nMethod = 1 (the default) is used by the SoftmaxLayer and nMethod = 2 is used by the GRNLayer.

Definition at line 8272 of file CudaDnn.cs.

◆ channel_mulv()

void MyCaffe.common.CudaDnn< T >.channel_mulv	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hA,
		long	hX,
		long	hC
	)

Multiplies the values in vector X by each channel in matrix A and places the result in matrix C.

Parameters

nCount	Specifies the number of elements in A.
nOuterNum	Specifies the number of items within A.
nChannels	Specifies the number of channels per item of A.
nInnerNum	Specifies the dimension of each item in A and X.
hA	Specifies a handle to the matrix A in GPU memory.
hX	Specifies a handle to the vector X in GPU memory (must be of length nInnerDim).
hC	Specifies a handle to the matrix C in GPU memory where the results are placed (matrix A and C are the same shape).

Definition at line 8290 of file CudaDnn.cs.

◆ channel_op_bwd()

void MyCaffe.common.CudaDnn< T >.channel_op_bwd	(	OP	op,
		int	nCount,
		int	nC,
		int	nN1,
		int	nSD1,
		int	nN2,
		int	nSD2,
		int	nCy,
		int	nSDy,
		long	hA,
		long	hB,
		long	hY,
		long	hAd,
		long	hBd,
		long	hYd,
		long	hWork
	)

Performs a channel operation backward on the data.

Parameters

op	Specifies the operation to perform.
nCount	Specifies the number of items in Y which should equal max(nN1, nN2) x nC x max(nSD1, nSD2).
nC	Specifies the channels in both A, B and Y.
nN1	Specifies the number of items in A.
nSD1	Specifies the spatial dimension of each item of A.
nN2	Specifies the number of items in B.
nSD2	Specifies the spatial dimension of each item of B.
nCy	Specifies the channels of each item of Y.
nSDy	Specifies the spatial dimension of each item of Y.
hA	Specifies a handle to the memory of A which has the size nN1 x nC1 x nSD1.
hB	Specifies a handle to the memory of B which has the size nN2 x nC2 x nSD2.
hY	Specifies a handle to the memory where the result is placed during FWD with size max(nN1, nN2) x nC x max(nSD1, nSD2).
hAd	Optionally, specifies a handle to the memory of the diff for A (filled during BWD) with size nN1, nC, nSD1.
hBd	Optionally, specifies a handle to the memory of the diff for b (filled during BWD) with size nN2, nC, nSD2.
hYd	Optionally, specifies a handle to the memory of the diff for Y (used during BWD).
hWork	Optionally, specifies a handle to work memory with the same size as Y (used during BWD)

Definition at line 8413 of file CudaDnn.cs.

◆ channel_op_fwd()

void MyCaffe.common.CudaDnn< T >.channel_op_fwd	(	OP	op,
		int	nCount,
		int	nC,
		int	nN1,
		int	nSD1,
		int	nN2,
		int	nSD2,
		long	hA,
		long	hB,
		long	hY
	)

Performs a channel operation forward on the data.

Parameters

op	Specifies the operation to perform.
nCount	Specifies the number of items in Y which should equal max(nN1, nN2) x nC x max(nSD1, nSD2).
nC	Specifies the channels in both A, B and Y.
nN1	Specifies the number of items in A.
nSD1	Specifies the spatial dimension of each item of A.
nN2	Specifies the number of items in B.
nSD2	Specifies the spatial dimension of each item of B.
hA	Specifies a handle to the memory of A which has the size nN1 x nC1 x nSD1.
hB	Specifies a handle to the memory of B which has the size nN2 x nC2 x nSD2.
hY	Specifies a handle to the memory where the result is placed during FWD with size max(nN1, nN2) x nC x max(nSD1, nSD2).

Definition at line 8382 of file CudaDnn.cs.

◆ channel_percentile()

void MyCaffe.common.CudaDnn< T >.channel_percentile	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		double	dfPercentile
	)

Calculates the percentile along axis = 0.

Parameters

nCount	Specifies the total number of elements in Y which = count(X)*nInnerDim in length.
nOuterNum	Specifies the number of items.
nChannels	Specifies the number of channels.
nInnerNum	Specifies the dimension of each inner dim within the channel.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dfPercentile	Specifies the percentile to calculate.

Definition at line 8361 of file CudaDnn.cs.

◆ channel_scale()

void MyCaffe.common.CudaDnn< T >.channel_scale	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hA,
		long	hY
	)

Multiplies the values of the channels from X with the scalar values in B and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of items within X and B.
nChannels	Specifies the number of channels per item of X and B.
nInnerNum	Specifies the dimension of each data item in X (B should have data dimension = 1).
hX	Specifies a handle to the vector X in GPU memory.
hA	Specifies a handle to the vector B containing the scalar values, one per num * channel.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8308 of file CudaDnn.cs.

◆ channel_sub() [1/2]

void MyCaffe.common.CudaDnn< T >.channel_sub	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hA,
		long	hX,
		long	hY
	)

Subtracts the values across the channels of X from A and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hA	Specifies a handle to the vector A in GPU memory.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8197 of file CudaDnn.cs.

◆ channel_sub() [2/2]

void MyCaffe.common.CudaDnn< T >.channel_sub	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Subtracts the values across the channels from X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8214 of file CudaDnn.cs.

◆ channel_sum()

void MyCaffe.common.CudaDnn< T >.channel_sum	(	int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		long	hX,
		long	hY,
		bool	bSumAcrossChannels = `true`,
		DIR	dir = `DIR.FWD`,
		int	nChannelsY = `-1`
	)

Calculates the sum the the values either across or within each channel (depending on bSumAcrossChannels setting) of X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of images within X.
nChannels	Specifies the number of channels per image of X.
nInnerNum	Specifies the dimension of each image in X.
hX	Specifies a handle to the vector X in GPU memory (with expected size nOuterNum, nChannels, nInnerNum).
hY	Specifies a handle to the vector Y in GPU memory (with expected size nOuterNum, nChannels, 1).
bSumAcrossChannels	Specifies to sum across channels (true), or within each channel (false), default = true.

" <param name="dir">Optionally, specifies the direction (default = DIR.FWD). When DIR.BWD is used, data flows from Y to X where Y data is copied to X and duplicated across the channels of Y. When using bSumAcrossChannels = true, ordering is based on Y ordering Y(c1,c2,c3,c1,c2,c3,c1,c2,c3), and when using bSumAcrossChannels = false, ordering is based on X ordering Y(c1,c1,c1,c2,c2,c2,c3,c3,c3).</param> <param name="nChannelsY">Optionally, specifies the channels of Y (used in special case where Y channels = 1)

Definition at line 8236 of file CudaDnn.cs.

◆ CheckMemoryAttributes()

bool MyCaffe.common.CudaDnn< T >.CheckMemoryAttributes	(	long	hSrc,
		int	nSrcDeviceID,
		long	hDst,
		int	nDstDeviceID
	)

Check the memory attributes of two memory blocks on different devices to see if they are compatible for peer-to-peer memory transfers.

Parameters

hSrc	Specifies the handle to the source memory.
nSrcDeviceID	Specifies the device id where the source memory resides.
hDst	Specifies the handle to the destination memory.
nDstDeviceID	Specifies the device id where the destination memory resides.

Returns: This function returns
true

when both devices support peer-to-peer communcation,
false

otherwise.

Definition at line 2160 of file CudaDnn.cs.

◆ clip_bwd()

void MyCaffe.common.CudaDnn< T >.clip_bwd	(	int	nCount,
		long	hTopDiff,
		long	hBottomData,
		long	hBottomDiff,
		T	fMin,
		T	fMax
	)

Performs a Clip backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
fMin	Specifies the bottom value to clip to.
fMax	Specifies the top value to clip to.

Definition at line 8931 of file CudaDnn.cs.

◆ clip_fwd()

void MyCaffe.common.CudaDnn< T >.clip_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		T	fMin,
		T	fMax
	)

Performs a Clip forward pass in Cuda.

Calculation $Y[i] = \max(min, \min(max,X[i]))$

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
fMin	Specifies the bottom value to clip to.
fMax	Specifies the top value to clip to.

Definition at line 8914 of file CudaDnn.cs.

◆ cll_bwd()

void MyCaffe.common.CudaDnn< T >.cll_bwd	(	int	nCount,
		int	nChannels,
		double	dfMargin,
		bool	bLegacyVersion,
		double	dfAlpha,
		long	hY,
		long	hDiff,
		long	hDistSq,
		long	hBottomDiff
	)

Performs a contrastive loss layer backward pass in Cuda.

See Dimensionality Reduction by Learning an Invariant Mapping by Hadsel, et al., 2006

Parameters

nCount	Specifies the number of items.
nChannels	Specifies the number of channels.
dfMargin	*Specifies the margin to use. The default is 1.0.*
bLegacyVersion	When false the calculation proposed by Hadsell, et al., 2006 is used where , otherwise the legacy version is used where . The default is false
dfAlpha	*NEEDS REVIEW*
hY	Specifies the Y data in GPU memory used to determine similar pairs.
hDiff	Specifies the diff in GPU memory.
hDistSq	Specifies the distance squared data in GPU memory.
hBottomDiff	Specifies the bottom diff in GPU memory.

Definition at line 10025 of file CudaDnn.cs.

◆ coeff_sub_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_bwd	(	int	nCount,
		int	nDim,
		int	nNumOffset,
		double	dfCoeff,
		long	hCoeffData,
		long	hTopDiff,
		long	hBottomDiff
	)

Performs a coefficient sub backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
nDim	*Specifies the dimension of the data where the data is sized 'num' x 'dim'.*
nNumOffset	Specifies the offset applied to the coefficent indexing.
dfCoeff	Specifies a primary coefficient value applied to each input before summing.
hCoeffData	Optionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 10537 of file CudaDnn.cs.

◆ coeff_sub_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sub_fwd	(	int	nCount,
		int	nDim,
		int	nNumOffset,
		double	dfCoeff,
		long	hCoeffData,
		long	hBottom,
		long	hTop
	)

Performs a coefficient sub foward pass in Cuda.

Parameters

nCount	Specifies the number of items.
nDim	*Specifies the dimension of the data where the data is sized 'num' x 'dim'.*
nNumOffset	Specifies the offset applied to the coefficent indexing.
dfCoeff	Specifies a primary coefficient value applied to each input before summing.
hCoeffData	Optionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottom	Specifies a handle to the bottom data in GPU memory.
hTop	Specifies a handle to the top data in GPU memory.

Definition at line 10518 of file CudaDnn.cs.

◆ coeff_sum_bwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_bwd	(	int	nCount,
		int	nDim,
		int	nNumOffset,
		double	dfCoeff,
		long	hCoeffData,
		long	hTopDiff,
		long	hBottomDiff
	)

Performs a coefficient sum backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
nDim	*Specifies the dimension of the data where the data is sized 'num' x 'dim'.*
nNumOffset	Specifies the offset applied to the coefficent indexing.
dfCoeff	Specifies a primary coefficient value applied to each input before summing.
hCoeffData	Optionally specifies a handle to coefficient data that is applied to the primary coefficient.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 10500 of file CudaDnn.cs.

◆ coeff_sum_fwd()

void MyCaffe.common.CudaDnn< T >.coeff_sum_fwd	(	int	nCount,
		int	nDim,
		int	nNumOffset,
		double	dfCoeff,
		long	hCoeffData,
		long	hBottom,
		long	hTop
	)

Performs a coefficient sum foward pass in Cuda.

Parameters

nCount	Specifies the number of items.
nDim	*Specifies the dimension of the data where the data is sized 'num' x 'dim'.*
nNumOffset	Specifies the offset applied to the coefficent indexing.
dfCoeff	Specifies a primary coefficient value applied to each input before summing.
hCoeffData	Optionally specifies a handle to coefficient data that is applied to the primary coefficient.
hBottom	Specifies a handle to the bottom data in GPU memory.
hTop	Specifies a handle to the top data in GPU memory.

Definition at line 10481 of file CudaDnn.cs.

◆ col2im()

void MyCaffe.common.CudaDnn< T >.col2im	(	long	hDataCol,
		int	nDataColOffset,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nKernelH,
		int	nKernelW,
		int	nPadH,
		int	nPadW,
		int	nStrideH,
		int	nStrideW,
		int	nDilationH,
		int	nDilationW,
		long	hDataIm,
		int	nDataImOffset
	)

Rearranges the columns into image blocks.

Parameters

hDataCol	Specifies a handle to the column data in GPU memory.
nDataColOffset	Specifies an offset into the column memory.
nChannels	Specifies the number of channels in the image.
nHeight	Specifies the height of the image.
nWidth	Specifies the width of the image.
nKernelH	Specifies the kernel height.
nKernelW	Specifies the kernel width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nDilationH	Specifies the dilation along the height.
nDilationW	Specifies the dilation along the width.
hDataIm	Specifies a handle to the image block in GPU memory.
nDataImOffset	Specifies an offset into the image block memory.

Definition at line 8039 of file CudaDnn.cs.

◆ col2im_nd()

void MyCaffe.common.CudaDnn< T >.col2im_nd	(	long	hDataCol,
		int	nDataColOffset,
		int	nNumSpatialAxes,
		int	nColCount,
		int	nChannelAxis,
		long	hImShape,
		long	hColShape,
		long	hKernelShape,
		long	hPad,
		long	hStride,
		long	hDilation,
		long	hDataIm,
		int	nDataImOffset
	)

Rearranges the columns into image blocks.

Parameters

hDataCol	Specifies a handle to the column data in GPU memory.
nDataColOffset	Specifies an offset into the column memory.
nNumSpatialAxes	Specifies the number of spatial axes.
nColCount	Specifies the number of kernels.
nChannelAxis	Specifies the axis containing the channel.
hImShape	Specifies a handle to the image shape data in GPU memory.
hColShape	Specifies a handle to the column shape data in GPU memory.
hKernelShape	Specifies a handle to the kernel shape data in GPU memory.
hPad	Specifies a handle to the pad data in GPU memory.
hStride	Specifies a handle to the stride data in GPU memory.
hDilation	Specifies a handle to the dilation data in GPU memory.
hDataIm	Specifies a handle to the image block in GPU memory.
nDataImOffset	Specifies an offset into the image block memory.

Definition at line 8063 of file CudaDnn.cs.

◆ compare_signs()

void MyCaffe.common.CudaDnn< T >.compare_signs	(	int	n,
		long	hA,
		long	hB,
		long	hY
	)

Compares the signs of each value in A and B and places the result in Y.

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7653 of file CudaDnn.cs.

◆ concat_bwd()

void MyCaffe.common.CudaDnn< T >.concat_bwd	(	int	nCount,
		long	hTopDiff,
		int	nNumConcats,
		int	nConcatInputSize,
		int	nTopConcatAxis,
		int	nBottomConcatAxis,
		int	nOffsetConcatAxis,
		long	hBottomDiff
	)

Performs a concat backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nNumConcats	Specifies the number of concatenations.
nConcatInputSize	Specifies the concatenation input size.
nTopConcatAxis	*NEEDS REVIEW*
nBottomConcatAxis	*NEEDS REVIEW*
nOffsetConcatAxis	*NEEDS REVIEW*
hBottomDiff	Specifies a handle to the Bottom diff in GPU memory.

Definition at line 9869 of file CudaDnn.cs.

◆ concat_fwd()

void MyCaffe.common.CudaDnn< T >.concat_fwd	(	int	nCount,
		long	hBottomData,
		int	nNumConcats,
		int	nConcatInputSize,
		int	nTopConcatAxis,
		int	nBottomConcatAxis,
		int	nOffsetConcatAxis,
		long	hTopData
	)

Performs a concat forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
nNumConcats	Specifies the number of concatenations.
nConcatInputSize	Specifies the concatenation input size.
nTopConcatAxis	Specifies the top axis to concatenate.
nBottomConcatAxis	*NEEDS REVIEW*
nOffsetConcatAxis	*NEEDS REVIEW*
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9849 of file CudaDnn.cs.

◆ contains_point()

bool MyCaffe.common.CudaDnn< T >.contains_point	(	int	n,
		long	hMean,
		long	hWidth,
		long	hX,
		long	hWork,
		int	nXOff = `0`
	)

Returns true if the point is contained within the bounds.

Parameters

n	Specifies the number of items.
hMean	Specifies a handle to the mean values in GPU memory.
hWidth	Specifies a handle to the width values in GPU memory.
hX	Specifies a handle to the X values in GPU memory.
hWork	Specifies a handle to the work data in GPU memory.
nXOff	Optionally, specifies an offset into the X vector (default = 0).

Returns: If the X values are within the bounds, true is returned, otherwise false.

Definition at line 7943 of file CudaDnn.cs.

◆ ConvertByteSizeToCount()

static ulong MyCaffe.common.CudaDnn< T >.ConvertByteSizeToCount ( ulong ulSizeInBytes )

static

Converts the byte size into the number of items in the base data type of float or double.

Parameters

ulSizeInBytes Specifies the size in bytes to convert.

Returns: The number of items is returned.

Definition at line 2438 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias	(	long	hCuDnn,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		long	hBiasDesc,
		long	hBiasDiff,
		int	nBiasOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution backward pass on the bias.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
hBiasDesc	Specifies a handle to the bias tensor descriptor.
hBiasDiff	Specifies a handle to the bias diff in GPU memory.
nBiasOffset	Specifies an offset into the diff memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3901 of file CudaDnn.cs.

◆ ConvolutionBackwardBias() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardBias	(	long	hCuDnn,
		T	fAlpha,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		T	fBeta,
		long	hBiasDesc,
		long	hBiasDiff,
		int	nBiasOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution backward pass on the bias.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
fBeta	Specifies a scaling factor applied to the prior destination value.
hBiasDesc	Specifies a handle to the bias tensor descriptor.
hBiasDiff	Specifies a handle to the bias diff in GPU memory.
nBiasOffset	Specifies an offset into the diff memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3919 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData	(	long	hCuDnn,
		long	hFilterDesc,
		long	hWeight,
		int	nWeightOffset,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		long	hConvDesc,
		CONV_BWD_DATA_ALGO	algoBwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		long	hBottomDesc,
		long	hBottomDiff,
		int	nBottomOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution backward pass on the data.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hFilterDesc	Specifies a handle to the filter descriptor.
hWeight	Specifies a handle to the weight data in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoBwd	Specifies the algorithm to use when performing the backward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3999 of file CudaDnn.cs.

◆ ConvolutionBackwardData() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardData	(	long	hCuDnn,
		T	fAlpha,
		long	hFilterDesc,
		long	hWeight,
		int	nWeightOffset,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		long	hConvDesc,
		CONV_BWD_DATA_ALGO	algoBwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		T	fBeta,
		long	hBottomDesc,
		long	hBottomDiff,
		int	nBottomOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution backward pass on the data.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hFilterDesc	Specifies a handle to the filter descriptor.
hWeight	Specifies a handle to the weight data in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoBwd	Specifies the algorithm to use when performing the backward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 4025 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter	(	long	hCuDnn,
		long	hBottomDesc,
		long	hBottomData,
		int	nBottomOffset,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		long	hConvDesc,
		CONV_BWD_FILTER_ALGO	algoBwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		long	hFilterDesc,
		long	hWeightDiff,
		int	nWeightOffset,
		bool	bSyncStream
	)

Perform a convolution backward pass on the filter.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoBwd	Specifies the algorithm to use when performing the backward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
hFilterDesc	Specifies a handle to the filter descriptor.
hWeightDiff	Specifies a handle to the weight diff in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3946 of file CudaDnn.cs.

◆ ConvolutionBackwardFilter() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionBackwardFilter	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDesc,
		long	hBottomData,
		int	nBottomOffset,
		long	hTopDesc,
		long	hTopDiff,
		int	nTopOffset,
		long	hConvDesc,
		CONV_BWD_FILTER_ALGO	algoBwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		T	fBeta,
		long	hFilterDesc,
		long	hWeightDiff,
		int	nWeightOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution backward pass on the filter.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoBwd	Specifies the algorithm to use when performing the backward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
fBeta	Specifies a scaling factor applied to the prior destination value.
hFilterDesc	Specifies a handle to the filter descriptor.
hWeightDiff	Specifies a handle to the weight diff in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3972 of file CudaDnn.cs.

◆ ConvolutionForward() [1/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward	(	long	hCuDnn,
		long	hBottomDesc,
		long	hBottomData,
		int	nBottomOffset,
		long	hFilterDesc,
		long	hWeight,
		int	nWeightOffset,
		long	hConvDesc,
		CONV_FWD_ALGO	algoFwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		long	hTopDesc,
		long	hTopData,
		int	nTopOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
hFilterDesc	Specifies a handle to the filter descriptor.
hWeight	Specifies a handle to the weight data in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoFwd	Specifies the algorithm to use for the foward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3856 of file CudaDnn.cs.

◆ ConvolutionForward() [2/2]

void MyCaffe.common.CudaDnn< T >.ConvolutionForward	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDesc,
		long	hBottomData,
		int	nBottomOffset,
		long	hFilterDesc,
		long	hWeight,
		int	nWeightOffset,
		long	hConvDesc,
		CONV_FWD_ALGO	algoFwd,
		long	hWorkspace,
		int	nWorkspaceOffset,
		ulong	lWorkspaceSize,
		T	fBeta,
		long	hTopDesc,
		long	hTopData,
		int	nTopOffset,
		bool	bSyncStream = `true`
	)

Perform a convolution forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
nBottomOffset	Specifies an offset into the bottom memory (in items, not bytes).
hFilterDesc	Specifies a handle to the filter descriptor.
hWeight	Specifies a handle to the weight data in GPU memory.
nWeightOffset	Specifies an offset into the weight memory (in items, not bytes).
hConvDesc	Specifies a handle to the convolution descriptor.
algoFwd	Specifies the algorithm to use for the foward operation.
hWorkspace	Specifies a handle to the GPU memory to use for the workspace.
nWorkspaceOffset	Specifies an offset into the workspace memory.
lWorkspaceSize	Specifies the size of the workspace memory (in bytes).
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
nTopOffset	Specifies an offset into the top memory (in items, not bytes).
bSyncStream	Optionally, specifies whether or not to syncrhonize the stream. The default = true.

Definition at line 3882 of file CudaDnn.cs.

◆ copy() [1/2]

void MyCaffe.common.CudaDnn< T >.copy	(	int	nCount,
		int	nNum,
		int	nDim,
		long	hSrc1,
		long	hSrc2,
		long	hDst,
		long	hSimilar,
		bool	bInvert = `false`
	)

Copy similar items of length 'nDim' from hSrc1 (where hSimilar(i) = 1) and dissimilar items of length 'nDim' from hSrc2 (where hSimilar(i) = 0).

Parameters

nCount	Specifies the total data length of hSrc1, hSrc2 and hDst.
nNum	Specifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDim	Specifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrc1	Specifies a handle to the GPU memory of source 1.
hSrc2	Specifies a handle to the GPU memory of source 2.
hDst	Specifies a handle to the GPU memory of the destination.
hSimilar	Specifies a handle to the GPU memory of the similar data.
bInvert	Optionally, specifies whether or not to invert the similar values (e.g. copy when similar = 0 instead of similar = 1)

Definition at line 6035 of file CudaDnn.cs.

◆ copy() [2/2]

void MyCaffe.common.CudaDnn< T >.copy	(	int	nCount,
		long	hSrc,
		long	hDst,
		int	nSrcOffset = `0`,
		int	nDstOffset = `0`,
		long	hStream = `-1`,
		bool?	bSrcHalfSizeOverride = `null`,
		bool?	bDstHalfSizeOverride = `null`
	)

Copy data from one block of GPU memory to another.

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

nCount	Specifies the number of items (not bytes) to copy.
hSrc	Specifies a handle to GPU memory containing the source data.
hDst	Specifies a handle to GPU memory containing the destination data.
nSrcOffset	Optionally specifies the offset into the source data where the copying starts.
nDstOffset	Optionally specifies the offset into the destination data where the copying starts.
hStream	Optionally, specifies a handle to a stream to use for the operation.
bSrcHalfSizeOverride	Optionally, specifies and override for the half size state of the source (default = null, which is ignored).
bDstHalfSizeOverride	Optionally, specifies and override for the half size state of the destination (default = null, which is ignored).

Definition at line 6007 of file CudaDnn.cs.

◆ copy_batch()

void MyCaffe.common.CudaDnn< T >.copy_batch	(	int	nCount,
		int	nNum,
		int	nDim,
		long	hSrcData,
		long	hSrcLbl,
		int	nDstCount,
		long	hDstCache,
		long	hWorkDevData,
		int	nLabelStart,
		int	nLabelCount,
		int	nCacheSize,
		long	hCacheHostCursors,
		long	hWorkDataHost
	)

Copy a batch of labeled items into a cache organized by label where older data is removed and replaced by newer data.

Parameters

nCount	Specifies the total data length of hSrc.
nNum	Specifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDim	Specifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcData	Specifies a handle to the GPU memory of source data.
hSrcLbl	Specifies a handle to the GPU memory of source labels.
nDstCount	Specifies the total data length of the hDstCache
hDstCache	Specifies a handle to the GPU memory of the destination cache.
hWorkDevData	Specifies a handle to the GPU memory of the device work data that is the same size as the hDstCache.
nLabelStart	Specifies the first label of all possible labels.
nLabelCount	Specifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSize	Specifies the size of each labeled data cache.
hCacheHostCursors	Specifies a handle to host memmory (allocated using AllocateHostBuffer) containing the label cursors - there should be 'nLabelCount' cursors.
hWorkDataHost	Specifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length.

NOTE: The cache size must be set at a sufficient size that covers the maximum number items for any given label within a batch, otherwise cached items will be overwritten for items in the current batch.

Definition at line 6062 of file CudaDnn.cs.

◆ copy_expand()

void MyCaffe.common.CudaDnn< T >.copy_expand	(	int	n,
		int	nNum,
		int	nDim,
		long	hX,
		long	hA
	)

Expand a vector of length 'nNum' into a matrix of size 'nNum' x 'nDim' by copying each value of the vector into all elements of the corresponding matrix row.

Parameters

n	Specifies the total number of items in the matrix 'A'
nNum	Specifies the total number of rows in the matrix 'A' and the total number of items in the vector 'X'.
nDim	Specifies the total number of columns in the matrix 'A'.
hX	Specifies the 'nNum' length vector to expand.
hA	Specifies the 'nNum' x 'nDim' matrix.

Definition at line 6182 of file CudaDnn.cs.

◆ copy_sequence() [1/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence	(	int	n,
		long	hSrc,
		int	nSrcStep,
		int	nSrcStartIdx,
		int	nCopyCount,
		int	nCopyDim,
		long	hDst,
		int	nDstStep,
		int	nDstStartIdx,
		int	nSrcSpatialDim,
		int	nDstSpatialDim,
		int	nSrcSpatialDimStartIdx = `0`,
		int	nDstSpatialDimStartIdx = `0`,
		int	nSpatialDimCount = `-1`
	)

Copy a sequence from a source to a destination and allow for skip steps.

Parameters

n	Specifies the total number of items in src.
hSrc	Specifies a handle to the source GPU memory.
nSrcStep	Specifies the stepping used across the source.
nSrcStartIdx	Specifies the starting index into the source.
nCopyCount	Specifies the number of items to copy.
nCopyDim	Specifies the dimension to copy (which x spatial dim = total copy amount).
hDst	Specifies a handle to the destination GPU memory.
nDstStep	Specifies the steping used across the desination.
nDstStartIdx	Specifies the starting index where data is to be copied in the destination.
nSrcSpatialDim	Specifies the src spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nDstSpatialDim	Specifies the dst spatial dim of each item copied. Src and Dst spatial dims should be equal when nSpatialDimCount is not used.
nSrcSpatialDimStartIdx	Optionally, specifies the start index within the source spatial dim to start the copy (default = 0)
nDstSpatialDimStartIdx	Optionally, specifies the start index within the destination spatial dim to start the copy (default = 0)
nSpatialDimCount	Optionally, specifies the number of items to copy from within the spatial dim (default = -1, copy all)

Definition at line 6165 of file CudaDnn.cs.

◆ copy_sequence() [2/2]

void MyCaffe.common.CudaDnn< T >.copy_sequence	(	int	nK,
		int	nNum,
		int	nDim,
		long	hSrcData,
		long	hSrcLbl,
		int	nSrcCacheCount,
		long	hSrcCache,
		int	nLabelStart,
		int	nLabelCount,
		int	nCacheSize,
		long	hCacheHostCursors,
		bool	bOutputLabels,
		List< long >	rghTop,
		List< int >	rgnTopCount,
		long	hWorkDataHost,
		bool	bCombinePositiveAndNegative = `false`,
		int	nSeed = `0`
	)

Copy a sequence of cached items, organized by label, into an anchor, positive (if nK > 0), and negative blobs.

Parameters

nK	Specifies the output type expected where: nK = 0, outputs to 2 tops (anchor and one negative), or nK > 0, outputs to 2 + nK tops (anchor, positive, nK negatives). The rghTop and rgnTopCount must be sized accordingly.
nNum	Specifis the number of outer items in hSrc1, hSrc2, hDst, and the number of elements in hSimilar.
nDim	Specifies the inner dimension of hSrc1, hSrc2 and hDst.
hSrcData	Specifies a handle to the GPU memory of source data.
hSrcLbl	Specifies a handle to the GPU memory of source labels.
nSrcCacheCount	Specifis the number of items in hSrcCache (nCacheSize * nLabelCount).
hSrcCache	Specifies a handle to the cached labeled data.
nLabelStart	Specifies the first label of all possible labels.
nLabelCount	Specifies the total number of labels (expects labels to be sequential from 'nLabelStart').
nCacheSize	Specifies the size of each labeled data cache.
hCacheHostCursors	Specifies a handle to host memmory containing the label cursors - there should be 'nLabelCount' cursors.
bOutputLabels	Specifies whether or not to output labels or not. When true, one additional top is expected for the labels.
rghTop	Specifies a list of the GPU memory for each top item. The number of top items expected depends on the 'nK' value.
rgnTopCount	Specifies a list of the item count for each top item. The number of top items expected depends on the 'nK' value.
hWorkDataHost	Specifies a handle to host memory (allocated using AllocateHostBuffer) used for work - must be nNum in item length and must be the same hWorkDataHost passed to 'copy_batch'.
bCombinePositiveAndNegative	Optionally, specifies to combine the positive and negative items by alternating between each and placing both in Top[1], while also making sure the output labels reflect the alternation.
nSeed	Optionally, specifies a seed for the random number generator (default = 0, which igores this parameter).

Receiving an error ERROR_BATCH_TOO_SMALL indicates that the batch size is too small and does not have enough labels to choose from. Each batch should have at least two instances of each labeled item.

NOTE: When 'nK' = 1 and 'bCombinePositiveAndNegative' = true, the label output has a dimension of 2, and and the tops used are as follows: top(0) = anchor; top(1) = alternating negative/positive, top(2) = labels if 'bOutputLabels' = true.

Definition at line 6095 of file CudaDnn.cs.

◆ CopyDeviceToHost()

void MyCaffe.common.CudaDnn< T >.CopyDeviceToHost	(	long	lCount,
		long	hGpuSrc,
		long	hHostDst
	)

Copy from GPU memory to Host memory.

Parameters

lCount	Specifies the number of items (of base type each) to copy.
hGpuSrc	Specifies the GPU memory containing the source data.
hHostDst	Specifies the Host memory containing the host destination.

Definition at line 2554 of file CudaDnn.cs.

◆ CopyHostToDevice()

void MyCaffe.common.CudaDnn< T >.CopyHostToDevice	(	long	lCount,
		long	hHostSrc,
		long	hGpuDst
	)

Copy from Host memory to GPU memory.

Parameters

lCount	Specifies the number of items (of base type each) to copy.
hHostSrc	Specifies the Host memory containing the host source data.
hGpuDst	Specifies the GPU memory containing the destination.

Definition at line 2568 of file CudaDnn.cs.

◆ CreateConvolutionDesc()

long MyCaffe.common.CudaDnn< T >.CreateConvolutionDesc ( )

Create a new instance of a convolution descriptor for use with NVIDIA's cuDnn.

Returns: The convolution descriptor handle is returned.

Definition at line 3747 of file CudaDnn.cs.

◆ CreateCuDNN()

long MyCaffe.common.CudaDnn< T >.CreateCuDNN ( long hStream = 0 )

Create a new instance of NVIDIA's cuDnn.

Parameters

hStream Specifies a stream used by cuDnn.

Returns: The handle to cuDnn is returned.

Definition at line 3263 of file CudaDnn.cs.

◆ CreateDropoutDesc()

long MyCaffe.common.CudaDnn< T >.CreateDropoutDesc ( )

Create a new instance of a dropout descriptor for use with NVIDIA's cuDnn.

Returns: The dropout descriptor handle is returned.

Definition at line 4203 of file CudaDnn.cs.

◆ CreateExtension()

long MyCaffe.common.CudaDnn< T >.CreateExtension ( string strExtensionDllPath )

Create an instance of an Extension DLL.

Parameters

strExtensionDllPath Specifies the file path to the extension DLL.

Returns: The handle to a new instance of Extension is returned.

Definition at line 3456 of file CudaDnn.cs.

◆ CreateFilterDesc()

long MyCaffe.common.CudaDnn< T >.CreateFilterDesc ( )

Create a new instance of a filter descriptor for use with NVIDIA's cuDnn.

Returns: The filter descriptor handle is returned.

Definition at line 3668 of file CudaDnn.cs.

◆ CreateImageOp()

long MyCaffe.common.CudaDnn< T >.CreateImageOp	(	int	nNum,
		double	dfBrightnessProb,
		double	dfBrightnessDelta,
		double	dfContrastProb,
		double	dfContrastLower,
		double	dfContrastUpper,
		double	dfSaturationProb,
		double	dfSaturationLower,
		double	dfSaturationUpper,
		long	lRandomSeed = `0`
	)

Create a new ImageOp used to perform image operations on the GPU.

Parameters

nNum	Specifies the number of items (usually the blob.num).
dfBrightnessProb	Specifies the brightness probability [0,1].
dfBrightnessDelta	Specifies the brightness delta.
dfContrastProb	Specifies the contrast probability [0,1]
dfContrastLower	Specifies the contrast lower bound value.
dfContrastUpper	Specifies the contrast upper bound value.
dfSaturationProb	Specifies the saturation probability [0,1]
dfSaturationLower	Specifies the saturation lower bound value.
dfSaturationUpper	Specifies the saturation upper bound value.
lRandomSeed	Optionally, specifies the random seed or 0 to ignore (default = 0).

Returns: A handle to the ImageOp is returned.

Definition at line 3153 of file CudaDnn.cs.

◆ CreateLayerNorm()

long MyCaffe.common.CudaDnn< T >.CreateLayerNorm	(	int	nGpuID,
		int	nCount,
		int	nOuterNum,
		int	nChannels,
		int	nInnerNum,
		float	fEps = `1e-10f`
	)

Create the Cuda version of LayerNorm

Parameters

nGpuID	Specifies the GPUID to use.
nCount	Specifies the total number of items in the input (and output).
nOuterNum	Specifies the outer number of items (e.g., num)
nChannels	Specifies the number of channels in the data.
nInnerNum	Specifies the spatial dimentions of the inner data.
fEps	Optionally, specifies the epsilon value to avoid numeric issues (default = 1e-10).

Returns: The handle to the LayerNorm configuration. This handle is used with all other layer norm functions.

Definition at line 5828 of file CudaDnn.cs.

◆ CreateLRNDesc()

long MyCaffe.common.CudaDnn< T >.CreateLRNDesc ( )

Create a new instance of a LRN descriptor for use with NVIDIA's cuDnn.

Returns: The LRN descriptor handle is returned.

Definition at line 4308 of file CudaDnn.cs.

◆ CreateMemoryPointer()

long MyCaffe.common.CudaDnn< T >.CreateMemoryPointer	(	long	hData,
		long	lOffset,
		long	lCount
	)

Creates a memory pointer into an already existing block of GPU memory.

Parameters

hData	Specifies a handle to the GPU memory.
lOffset	Specifies the offset into the GPU memory (in items, not bytes), where the pointer is to start.
lCount	Specifies the number of items (not bytes) in the 'virtual' memory block pointed to by the memory pointer.

Returns: A handle to the memory pointer is returned. Handles to memory poitners can be used like any other handle to GPU memory.

Definition at line 3028 of file CudaDnn.cs.

◆ CreateMemoryTest()

long MyCaffe.common.CudaDnn< T >.CreateMemoryTest	(	out ulong	ulTotalNumBlocks,
		out double	dfMemAllocatedInGB,
		out ulong	ulMemStartAddr,
		out ulong	ulBlockSize,
		double	dfPctToAllocate = `1.0`
	)

Creates a new memory test on the current GPU.

Parameters

ulTotalNumBlocks	Returns the total number of blocks available to test.
dfMemAllocatedInGB	Returns the total amount of allocated memory, specified in GB.
ulMemStartAddr	Returns the start address of the memory test.
ulBlockSize	Returns the block size of the memory to be tested.
dfPctToAllocate	Specifies the percentage of avaiable memory to test, where 1.0 = 100%.

Returns: A handle to the memory test is returned.

Definition at line 3069 of file CudaDnn.cs.

◆ CreateNCCL()

long MyCaffe.common.CudaDnn< T >.CreateNCCL	(	int	nDeviceId,
		int	nCount,
		int	nRank,
		Guid	guid
	)

Create an instance of NVIDIA's NCCL 'Nickel'

Parameters

nDeviceId	Specifies the device where this instance of NCCL is going to run.
nCount	Specifies the total number of NCCL instances used.
nRank	Specifies the zero-based rank of this instance of NCCL.
guid	Specifies the unique Guid for this isntance of NCCL.

Returns: The handle to a new instance of NCCL is returned.

Definition at line 3297 of file CudaDnn.cs.

◆ CreatePCA()

long MyCaffe.common.CudaDnn< T >.CreatePCA	(	int	nMaxIterations,
		int	nM,
		int	nN,
		int	nK,
		long	hData,
		long	hScoresResult,
		long	hLoadsResult,
		long	hResiduals = `0`,
		long	hEigenvalues = `0`
	)

Creates a new PCA instance and returns the handle to it.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

nMaxIterations	Specifies the number of iterations to run.
nM	Specifies the data width (number of rows).
nN	Specifies the data height (number of columns).
nK	Specifies the number of components (K less than or equal to N).
hData	Specifies a handle to the data allocated using AllocatePCAData.
hScoresResult	Specifies a handle to the data allocated using AllocatePCAScores.
hLoadsResult	Specifies a handle to the data allocated using AllocatePCALoads.
hResiduals	Specifies a handle to the data allocated using AllocatePCAData.
hEigenvalues	Specifies a handle to the data allocated using AllocatePCAEigenvalues.

Returns

Definition at line 5392 of file CudaDnn.cs.

◆ CreatePoolingDesc()

long MyCaffe.common.CudaDnn< T >.CreatePoolingDesc ( )

Create a new instance of a pooling descriptor for use with NVIDIA's cuDnn.

Returns: The pooling descriptor handle is returned.

Definition at line 4037 of file CudaDnn.cs.

◆ CreateRnn8()

long MyCaffe.common.CudaDnn< T >.CreateRnn8 ( )

Create the RNN8.

Returns: A handle to the RNN8 is returned.

Definition at line 5160 of file CudaDnn.cs.

◆ CreateRnnDataDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDataDesc ( )

Create the RNN Data Descriptor.

Returns: A handle to the RNN Data descriptor is returned.

Definition at line 4652 of file CudaDnn.cs.

◆ CreateRnnDesc()

long MyCaffe.common.CudaDnn< T >.CreateRnnDesc ( )

Create the RNN Descriptor.

Returns: A handle to the RNN descriptor is returned.

Definition at line 4733 of file CudaDnn.cs.

◆ CreateSSD()

long MyCaffe.common.CudaDnn< T >.CreateSSD	(	int	nNumClasses,
		bool	bShareLocation,
		int	nLocClasses,
		int	nBackgroundLabelId,
		bool	bUseDiffcultGt,
		SSD_MINING_TYPE	miningType,
		SSD_MATCH_TYPE	matchType,
		float	fOverlapThreshold,
		bool	bUsePriorForMatching,
		SSD_CODE_TYPE	codeType,
		bool	bEncodeVariantInTgt,
		bool	bBpInside,
		bool	bIgnoreCrossBoundaryBbox,
		bool	bUsePriorForNms,
		SSD_CONF_LOSS_TYPE	confLossType,
		SSD_LOC_LOSS_TYPE	locLossType,
		float	fNegPosRatio,
		float	fNegOverlap,
		int	nSampleSize,
		bool	bMapObjectToAgnostic,
		bool	bNmsParam,
		float?	fNmsThreshold = `null`,
		int?	nNmsTopK = `null`,
		float?	fNmsEta = `null`
	)

Create an instance of the SSD GPU support.

Parameters

nNumClasses	Specifies the number of classes.
bShareLocation	Specifies whether or not to share the location.
nLocClasses	Specifies the number of location classes.
nBackgroundLabelId	Specifies the background label ID.
bUseDiffcultGt	Specifies whether or not to use difficult ground truths.
miningType	Specifies the mining type to use.
matchType	Specifies the matching method to use.
fOverlapThreshold	Specifies the overlap threshold for each box.
bUsePriorForMatching	Specifies whether or not to use priors for matching.
codeType	Specifies the code type to use.
bEncodeVariantInTgt	Specifies whether or not to encode the variant in the target.
bBpInside	Specifies whether or not the BP is inside or not.
bIgnoreCrossBoundaryBbox	Specifies whether or not to ignore cross boundary boxes.
bUsePriorForNms	Specifies whether or not to use priors for NMS.
confLossType	Specifies the confidence loss type.
locLossType	Specifies the location loss type.
fNegPosRatio	Specifies the negative/positive ratio to use.
fNegOverlap	Specifies the negative overlap to use.
nSampleSize	Specifies the sample size.
bMapObjectToAgnostic	Specifies whether or not to map objects to agnostic or not.
bNmsParam	Specifies whether or not the NMS parameters are specified.
fNmsThreshold	Specifies the NMS threshold, which is only used when the 'bNmsParam' = true.
nNmsTopK	Specifies the NMS top-k selection, which is only used when the 'bNmsParam' = true.
fNmsEta	Specifies the NMS eta, which is only used when the 'bNmsParam' = true.

Returns: A handle to the SSD instance is returned.

Definition at line 5482 of file CudaDnn.cs.

◆ CreateStream()

long MyCaffe.common.CudaDnn< T >.CreateStream	(	bool	bNonBlocking = `false`,
		int	nIndex = `-1`
	)

Create a new stream on the current GPU.

Parameters

bNonBlocking	When false (the default) the created stream is a 'blocking' stream, otherwise it is an asynchronous, non-blocking stream.
nIndex	Specifies an index for the stream where indexed streams are shared when the index = 0 or greater.

Returns: The handle to the stream is returned.

Definition at line 3209 of file CudaDnn.cs.

◆ CreateTensorDesc()

long MyCaffe.common.CudaDnn< T >.CreateTensorDesc ( )

Create a new instance of a tensor descriptor for use with NVIDIA's cuDnn.

Returns: The tensor descriptor handle is returned.

Definition at line 3518 of file CudaDnn.cs.

◆ crop_bwd()

void MyCaffe.common.CudaDnn< T >.crop_bwd	(	int	nCount,
		int	nNumAxes,
		long	hSrcStrides,
		long	hDstStrides,
		long	hOffsets,
		long	hBottomDiff,
		long	hTopDiff
	)

Performs the crop backward operation.

Parameters

nCount	Specifies the count.
nNumAxes	Specifies the number of axes in the bottom.
hSrcStrides	Specifies a handle to the GPU memory containing the source strides.
hDstStrides	Specifies a handle to the GPU memory containing the destination strides.
hOffsets	Specifies a handle to the GPU memory containing the offsets.
hBottomDiff	Specifies a handle to the bottom data in GPU memory.
hTopDiff	Specifies a handle to the top data in GPU memory.

Definition at line 9830 of file CudaDnn.cs.

◆ crop_fwd()

void MyCaffe.common.CudaDnn< T >.crop_fwd	(	int	nCount,
		int	nNumAxes,
		long	hSrcStrides,
		long	hDstStrides,
		long	hOffsets,
		long	hBottomData,
		long	hTopData
	)

Performs the crop forward operation.

Parameters

nCount	Specifies the count.
nNumAxes	Specifies the number of axes in the bottom.
hSrcStrides	Specifies a handle to the GPU memory containing the source strides.
hDstStrides	Specifies a handle to the GPU memory containing the destination strides.
hOffsets	Specifies a handle to the GPU memory containing the offsets.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9812 of file CudaDnn.cs.

◆ debug()

void MyCaffe.common.CudaDnn< T >.debug ( )

The debug function is uses only during debugging the debug version of the low-level DLL.

Definition at line 10637 of file CudaDnn.cs.

◆ denan()

void MyCaffe.common.CudaDnn< T >.denan	(	int	n,
		long	hX,
		double	dfReplacement
	)

Replaces all NAN values witin X with a replacement value.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
hX	Specifies a handle to the vector X in GPU memory.
dfReplacement	Specifies the replacement value.

Definition at line 7963 of file CudaDnn.cs.

◆ DeriveBatchNormDesc()

void MyCaffe.common.CudaDnn< T >.DeriveBatchNormDesc	(	long	hFwdScaleBiasMeanVarDesc,
		long	hFwdBottomDesc,
		long	hBwdScaleBiasMeanVarDesc,
		long	hBwdBottomDesc,
		BATCHNORM_MODE	mode
	)

Derive the batch norm descriptors for both the forward and backward passes.

Parameters

hFwdScaleBiasMeanVarDesc	Specifies a handle to the scale bias mean var tensor descriptor for the forward pass.
hFwdBottomDesc	Specifies a handle to the forward bottom tensor descriptor.
hBwdScaleBiasMeanVarDesc	Specifies a handle to the scale bias mean var tensor descriptor for the backward pass.
hBwdBottomDesc	Specifies a handle to the backward bottom tensor descriptor.
mode

Definition at line 4132 of file CudaDnn.cs.

◆ DeviceCanAccessPeer()

bool MyCaffe.common.CudaDnn< T >.DeviceCanAccessPeer	(	int	nSrcDeviceID,
		int	nPeerDeviceID
	)

Query whether or not two devices can access each other via peer-to-peer memory copies.

Parameters

nSrcDeviceID	Specifies the device id of the source.
nPeerDeviceID	Specifies the device id of the peer to the source device.

Returns: true

is returned if the source device can access the peer device via peer-to-peer communcation,
false

otherwise.

Definition at line 2240 of file CudaDnn.cs.

◆ DeviceDisablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceDisablePeerAccess ( int nPeerDeviceID )

Disables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

Parameters

nPeerDeviceID Specifies the device id of the peer device.

Definition at line 2270 of file CudaDnn.cs.

◆ DeviceEnablePeerAccess()

void MyCaffe.common.CudaDnn< T >.DeviceEnablePeerAccess ( int nPeerDeviceID )

Enables peer-to-peer access between the current device used by the CudaDnn instance and a peer device.

Parameters

nPeerDeviceID Specifies the device id of the peer device.

Definition at line 2258 of file CudaDnn.cs.

◆ DisableGhostMemory()

void MyCaffe.common.CudaDnn< T >.DisableGhostMemory ( )

Disables the ghost memory, if enabled.

Definition at line 1775 of file CudaDnn.cs.

◆ Dispose() [1/2]

void MyCaffe.common.CudaDnn< T >.Dispose ( )

Disposes this instance freeing up all of its host and GPU memory.

Definition at line 1629 of file CudaDnn.cs.

◆ Dispose() [2/2]

virtual void MyCaffe.common.CudaDnn< T >.Dispose ( bool bDisposing )

protectedvirtual

Disposes this instance freeing up all of its host and GPU memory.

Parameters

bDisposing When true, specifies that the call is from a Dispose call.

Definition at line 1612 of file CudaDnn.cs.

◆ DistortImage()

void MyCaffe.common.CudaDnn< T >.DistortImage	(	long	h,
		int	nCount,
		int	nNum,
		int	nDim,
		long	hX,
		long	hY
	)

Distort an image using the ImageOp handle provided.

Parameters

h	Specifies a handle to the ImageOp that defines how the image will be distorted.
nCount	Specifies the total number of data elements within 'hX' and 'hY'.
nNum	Specifies the number of items to be distorted (typically blob.num) in 'hX' and 'hY'.
nDim	Specifies the dimension of each item.
hX	Specifies a handle to the GPU memory containing the source data to be distorted.
hY	Specifies a handle to the GPU memory containing the destination of the distortion.

Definition at line 3188 of file CudaDnn.cs.

◆ div()

void MyCaffe.common.CudaDnn< T >.div	(	int	n,
		long	hA,
		long	hB,
		long	hY
	)

Divides each element of A by each element of B and places the result in Y.

Y = A / B (element by element)

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7420 of file CudaDnn.cs.

◆ divbsx()

void MyCaffe.common.CudaDnn< T >.divbsx	(	int	n,
		long	hA,
		int	nAOff,
		long	hX,
		int	nXOff,
		int	nC,
		int	nSpatialDim,
		bool	bTranspose,
		long	hB,
		int	nBOff
	)

Divide a matrix by a vector.

Parameters

n	Specifies the number of items.
hA	Specifies the matrix to divide.
nAOff	Specifies the offset to apply to the GPU memory of hA.
hX	Specifies the divisor vector.
nXOff	Specifies the offset to apply to the GPU memory of hX.
nC	Specifies the number of channels.
nSpatialDim	Specifies the spatial dimension.
bTranspose	Specifies whether or not to transpose the matrix.
hB	Specifies the output matrix.
nBOff	Specifies the offset to apply to the GPU memory of hB.

Definition at line 6671 of file CudaDnn.cs.

◆ DivisiveNormalizationBackward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationBackward	(	long	hCuDnn,
		long	hNormDesc,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		long	hTopDiff,
		long	hTemp1,
		long	hTemp2,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Performs a Devisive Normalization backward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hNormDesc	Specifies a handle to an LRN descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTemp1	Temporary data in GPU memory.
hTemp2	Temporary data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4433 of file CudaDnn.cs.

◆ DivisiveNormalizationForward()

void MyCaffe.common.CudaDnn< T >.DivisiveNormalizationForward	(	long	hCuDnn,
		long	hNormDesc,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		long	hTemp1,
		long	hTemp2,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Performs a Devisive Normalization forward pass.

See What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures? by Jarrett, et al.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hNormDesc	Specifies a handle to an LRN descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTemp1	Temporary data in GPU memory.
hTemp2	Temporary data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4408 of file CudaDnn.cs.

◆ dot()

T MyCaffe.common.CudaDnn< T >.dot	(	int	n,
		long	hX,
		long	hY,
		int	nXOff = `0`,
		int	nYOff = `0`
	)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Returns: The dot product is returned as a type 'T'.

Definition at line 6847 of file CudaDnn.cs.

◆ dot_double()

double MyCaffe.common.CudaDnn< T >.dot_double	(	int	n,
		long	hX,
		long	hY
	)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Returns: The dot product is returned as a type
double

Definition at line 6815 of file CudaDnn.cs.

◆ dot_float()

float MyCaffe.common.CudaDnn< T >.dot_float	(	int	n,
		long	hX,
		long	hY
	)

Computes the dot product of X and Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Returns: The dot product is returned as a type
float

Definition at line 6830 of file CudaDnn.cs.

◆ dropout_bwd()

void MyCaffe.common.CudaDnn< T >.dropout_bwd	(	int	nCount,
		long	hTopDiff,
		long	hMask,
		uint	uiThreshold,
		T	fScale,
		long	hBottomDiff
	)

Performs a dropout backward pass in Cuda.

See also: Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.
uiThreshold	Specifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScale	Specifies a scale value applied to each item that is not dropped out.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9484 of file CudaDnn.cs.

◆ dropout_fwd()

void MyCaffe.common.CudaDnn< T >.dropout_fwd	(	int	nCount,
		long	hBottomData,
		long	hMask,
		uint	uiThreshold,
		T	fScale,
		long	hTopData
	)

Performs a dropout forward pass in Cuda.

See also: Improving neural networks by preventing co-adaptation of feature detectors by Hinton, et al., 2012

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.
uiThreshold	Specifies the threshold value: when mask value are less than the threshold, the data item is 'dropped out' by setting the data item to zero.
fScale	Specifies a scale value applied to each item that is not dropped out.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9464 of file CudaDnn.cs.

◆ DropoutBackward()

void MyCaffe.common.CudaDnn< T >.DropoutBackward	(	long	hCuDnn,
		long	hDropoutDesc,
		long	hTopDesc,
		long	hTop,
		long	hBottomDesc,
		long	hBottom,
		long	hReserved
	)

Performs a dropout backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hDropoutDesc	Specifies a handle to the dropout descriptor.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTop	Specifies a handle to the top data in GPU memory.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottom	Specifies a handle to the bottom data in GPU memory.
hReserved	Specifies a handle to the reseved data in GPU memory.

Definition at line 4296 of file CudaDnn.cs.

◆ DropoutForward()

void MyCaffe.common.CudaDnn< T >.DropoutForward	(	long	hCuDnn,
		long	hDropoutDesc,
		long	hBottomDesc,
		long	hBottomData,
		long	hTopDesc,
		long	hTopData,
		long	hReserved
	)

Performs a dropout forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hDropoutDesc	Specifies a handle to the dropout descriptor.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hReserved	Specifies a handle to the reseved data in GPU memory.

Definition at line 4278 of file CudaDnn.cs.

◆ elu_bwd()

void MyCaffe.common.CudaDnn< T >.elu_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomData,
		long	hBottomDiff,
		double	dfAlpha
	)

Performs a Exponential Linear Unit (ELU) backward pass in Cuda.

See also: Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
dfAlpha	Specifies the alpha value.

Definition at line 9444 of file CudaDnn.cs.

◆ elu_fwd()

void MyCaffe.common.CudaDnn< T >.elu_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		double	dfAlpha
	)

Performs a Exponential Linear Unit (ELU) forward pass in Cuda.

Calculates $f(x) = (x > 0) ? x : \alpha * (e^x - 1)$

See also: Deep Residual Networks with Exponential Linear Unit by Shah, et al., 2016

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
dfAlpha	Specifies the alpha value.

Definition at line 9424 of file CudaDnn.cs.

◆ EluBackward()

void MyCaffe.common.CudaDnn< T >.EluBackward	(	long	hCuDnn,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a Elu backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4513 of file CudaDnn.cs.

◆ EluForward()

void MyCaffe.common.CudaDnn< T >.EluForward	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Perform a Elu forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4491 of file CudaDnn.cs.

◆ embed_bwd()

void MyCaffe.common.CudaDnn< T >.embed_bwd	(	int	nCount,
		long	hBottomData,
		long	hTopDiff,
		int	nM,
		int	nN,
		int	nK,
		long	hWeightDiff
	)

Performs the backward pass for embed

Parameters

nCount	Specifies the number of items in the bottom data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nM	*NEEDS REVIEW*
nN	*NEEDS REVIEW*
nK	*NEEDS REVIEW*
hWeightDiff	Specifies a handle to the weight diff in GPU memory.

Definition at line 8781 of file CudaDnn.cs.

◆ embed_fwd()

void MyCaffe.common.CudaDnn< T >.embed_fwd	(	int	nCount,
		long	hBottomData,
		long	hWeight,
		int	nM,
		int	nN,
		int	nK,
		long	hTopData
	)

Performs the forward pass for embed

Parameters

nCount	Specifies the number of items in the bottom data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hWeight	Specifies a handle to the weight data in GPU memory.
nM	*NEEDS REVIEW*
nN	*NEEDS REVIEW*
nK	*NEEDS REVIEW*
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 8763 of file CudaDnn.cs.

◆ erf() [1/3]

double MyCaffe.common.CudaDnn< T >.erf ( double dfVal )

Calculates the erf() function.

Parameters

dfVal Specifies the input value.

Returns: The erf result is returned.

Definition at line 6986 of file CudaDnn.cs.

◆ erf() [2/3]

float MyCaffe.common.CudaDnn< T >.erf ( float fVal )

Calculates the erf() function.

Parameters

fVal	Specifies the input value.

Returns: The erf result is returned.

Definition at line 6996 of file CudaDnn.cs.

◆ erf() [3/3]

T MyCaffe.common.CudaDnn< T >.erf ( T fVal )

Calculates the erf() function.

Parameters

fVal	Specifies the input value.

Returns: The erf result is returned.

Definition at line 7006 of file CudaDnn.cs.

◆ exp() [1/2]

void MyCaffe.common.CudaDnn< T >.exp	(	int	n,
		long	hA,
		long	hY
	)

Calculates the exponent value of A and places the result in Y.

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7454 of file CudaDnn.cs.

◆ exp() [2/2]

void MyCaffe.common.CudaDnn< T >.exp	(	int	n,
		long	hA,
		long	hY,
		int	nAOff,
		int	nYOff,
		double	dfBeta
	)

Calculates the exponent value of A * beta and places the result in Y.

$f(x) = exp(x * \beta)$

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Specifies an offset (in items, not bytes) into the memory of A.
nYOff	Specifies an offset (in items, not bytes) into the memory of Y.
dfBeta	Specifies the scalar as type double

Definition at line 7471 of file CudaDnn.cs.

◆ fill()

void MyCaffe.common.CudaDnn< T >.fill	(	int	n,
		int	nDim,
		long	hSrc,
		int	nSrcOff,
		int	nCount,
		long	hDst
	)

Fill data from the source data 'n' times in the destination.

Parameters

n	Specifies the number of times to copy the source data.
nDim	Specifies the number of source items to copy.
hSrc	Specifies a handle to the GPU memory of the source data.
nSrcOff	Specifies an offset into the GPU memory where the source data copy starts.
nCount	Specifies the total number of items in the destination. This value must be >= n * nDim.
hDst	Specifies the handle to the GPU memory where the data is to be copied.

Definition at line 6199 of file CudaDnn.cs.

◆ FreeConvolutionDesc()

void MyCaffe.common.CudaDnn< T >.FreeConvolutionDesc ( long h )

Free a convolution descriptor instance.

Parameters

h	Specifies the handle to the convolution descriptor instance.

Definition at line 3765 of file CudaDnn.cs.

◆ FreeCuDNN()

void MyCaffe.common.CudaDnn< T >.FreeCuDNN ( long h )

Free an instance of cuDnn.

Parameters

h	Specifies the handle to cuDnn.

Definition at line 3281 of file CudaDnn.cs.

◆ FreeDropoutDesc()

void MyCaffe.common.CudaDnn< T >.FreeDropoutDesc ( long h )

Free a dropout descriptor instance.

Parameters

h	Specifies the handle to the dropout descriptor instance.

Definition at line 4221 of file CudaDnn.cs.

◆ FreeExtension()

void MyCaffe.common.CudaDnn< T >.FreeExtension ( long hExtension )

Free an instance of an Extension.

Parameters

hExtension Specifies the handle to the Extension.

Definition at line 3474 of file CudaDnn.cs.

◆ FreeFilterDesc()

void MyCaffe.common.CudaDnn< T >.FreeFilterDesc ( long h )

Free a filter descriptor instance.

Parameters

h	Specifies the handle to the filter descriptor instance.

Definition at line 3686 of file CudaDnn.cs.

◆ FreeHostBuffer()

void MyCaffe.common.CudaDnn< T >.FreeHostBuffer ( long hMem )

Free previously allocated host memory.

Parameters

hMem	Specifies the handle to the host memory.

Definition at line 2602 of file CudaDnn.cs.

◆ FreeImageOp()

void MyCaffe.common.CudaDnn< T >.FreeImageOp ( long h )

Free an image op, freeing up all GPU memory used.

Parameters

h	Specifies the handle to the image op.

Definition at line 3171 of file CudaDnn.cs.

◆ FreeLayerNorm()

void MyCaffe.common.CudaDnn< T >.FreeLayerNorm ( long hLayerNorm )

Free the instance of LayerNorm GPU support.

Parameters

hLayerNorm Specifies the handle to the LayerNorm instance.

Definition at line 5846 of file CudaDnn.cs.

◆ FreeLRNDesc()

void MyCaffe.common.CudaDnn< T >.FreeLRNDesc ( long h )

Free a LRN descriptor instance.

Parameters

h	Specifies the handle to the LRN descriptor instance.

Definition at line 4326 of file CudaDnn.cs.

◆ FreeMemory()

void MyCaffe.common.CudaDnn< T >.FreeMemory ( long hMem )

Free previously allocated GPU memory.

Parameters

hMem	Specifies the handle to the GPU memory.

Definition at line 2517 of file CudaDnn.cs.

◆ FreeMemoryPointer()

void MyCaffe.common.CudaDnn< T >.FreeMemoryPointer ( long hData )

Frees a memory pointer.

Parameters

hData Specifies the handle to the memory pointer.

Definition at line 3046 of file CudaDnn.cs.

◆ FreeMemoryTest()

void MyCaffe.common.CudaDnn< T >.FreeMemoryTest ( long h )

Free a memory test, freeing up all GPU memory used.

Parameters

h	Specifies the handle to the memory test.

Definition at line 3095 of file CudaDnn.cs.

◆ FreeNCCL()

void MyCaffe.common.CudaDnn< T >.FreeNCCL ( long hNccl )

Free an instance of NCCL.

Parameters

hNccl Specifies the handle to NCCL.

Definition at line 3355 of file CudaDnn.cs.

◆ FreePCA()

void MyCaffe.common.CudaDnn< T >.FreePCA ( long hPCA )

Free the PCA instance associated with handle.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

hPCA	Specifies a handle to the PCA instance to free.

Definition at line 5446 of file CudaDnn.cs.

◆ FreePoolingDesc()

void MyCaffe.common.CudaDnn< T >.FreePoolingDesc ( long h )

Free a pooling descriptor instance.

Parameters

h	Specifies the handle to the pooling descriptor instance.

Definition at line 4055 of file CudaDnn.cs.

◆ FreeRnn8()

void MyCaffe.common.CudaDnn< T >.FreeRnn8 ( long h )

Free an existing RNN8.

Parameters

h	Specifies the handle to the RNN8 created with CreateRnn8

Definition at line 5178 of file CudaDnn.cs.

◆ FreeRnnDataDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDataDesc ( long h )

Free an existing RNN Data descriptor.

Parameters

h	Specifies the handle to the RNN Data descriptor created with CreateRnnDataDesc

Definition at line 4672 of file CudaDnn.cs.

◆ FreeRnnDesc()

void MyCaffe.common.CudaDnn< T >.FreeRnnDesc ( long h )

Free an existing RNN descriptor.

Parameters

h	Specifies the handle to the RNN descriptor created with CreateRnnDesc

Definition at line 4751 of file CudaDnn.cs.

◆ FreeSSD()

void MyCaffe.common.CudaDnn< T >.FreeSSD ( long hSSD )

Free the instance of SSD GPU support.

Parameters

hSSD	Specifies the handle to the SSD instance.

Definition at line 5637 of file CudaDnn.cs.

◆ FreeStream()

void MyCaffe.common.CudaDnn< T >.FreeStream ( long h )

Free a stream.

Parameters

h	Specifies the handle to the stream.

Definition at line 3227 of file CudaDnn.cs.

◆ FreeTensorDesc()

void MyCaffe.common.CudaDnn< T >.FreeTensorDesc ( long h )

Free a tensor descriptor instance.

Parameters

h	Specifies the handle to the tensor descriptor instance.

Definition at line 3536 of file CudaDnn.cs.

◆ gather_bwd()

void MyCaffe.common.CudaDnn< T >.gather_bwd	(	int	nCount,
		long	hTop,
		long	hBottom,
		int	nAxis,
		int	nDim,
		int	nDimAtAxis,
		int	nM,
		int	nN,
		long	hIdx
	)

Performs a gather backward pass where data at specifies indexes along a given axis are copied to the output data.

Parameters

nCount	Specifies the number of items.
hTop	Specifies the input data.
hBottom	Specifies the output data.
nAxis	Specifies the axis along which to copy.
nDim	Specifies the dimension of each item at each index.
nDimAtAxis	Specifies the dimension at the axis.
nM	Specifies the M dimension.
nN	Specifies the M dimension.
hIdx	Specifies the indexes of the data to gather.

Definition at line 10122 of file CudaDnn.cs.

◆ gather_fwd()

void MyCaffe.common.CudaDnn< T >.gather_fwd	(	int	nCount,
		long	hBottom,
		long	hTop,
		int	nAxis,
		int	nDim,
		int	nDimAtAxis,
		int	nM,
		int	nN,
		long	hIdx
	)

Performs a gather forward pass where data at specifies indexes along a given axis are copied to the output data.

Parameters

nCount	Specifies the number of items.
hBottom	Specifies the input data.
hTop	Specifies the output data.
nAxis	Specifies the axis along which to copy.
nDim	Specifies the dimension of each item at each index.
nDimAtAxis	Specifies the dimension at the axis.
nM	Specifies the M dimension.
nN	Specifies the M dimension.
hIdx	Specifies the indexes of the data to gather.

Definition at line 10102 of file CudaDnn.cs.

◆ gaussian_blur()

void MyCaffe.common.CudaDnn< T >.gaussian_blur	(	int	n,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		double	dfSigma,
		long	hX,
		long	hY
	)

The gaussian_blur runs a Gaussian blurring operation over each channel of the data using the sigma.

The gaussian blur operation runs a 3x3 patch, initialized with the gaussian distribution using the formula $G(x, y) = \frac{1}{{2\pi\sigma^2 }}e^{{{ - \left( {x^2 - y^2 } \right) } \mathord{\left/ {\vphantom {{ - \left( {x^2 - y^2 } \right) } {2\sigma ^2 }}} \right. \kern-\nulldelimiterspace} {2\sigma ^2 }}}$

See also: Gaussian Blur on Wikipedia for more information.

Parameters

n	Specifies the number of items in the memory of 'X'.
nChannels	Specifies the number of channels (i.e. 3 for RGB, 1 for B/W).
nHeight	Specifies the height of each item.
nWidth	Specifies the width of each item.
dfSigma	Specifies the sigma used in the gaussian blur.
hX	Specifies a handle to GPU memory containing the source data to blur.
hY	Specifies a handle to GPU memory where the blurred information is placed.

Definition at line 10980 of file CudaDnn.cs.

◆ geam() [1/3]

void MyCaffe.common.CudaDnn< T >.geam	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		double	fAlpha,
		long	hA,
		long	hB,
		double	fBeta,
		long	hC
	)

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A, B and C.
n	Specifies the height (number of rows) of A, B and C.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type double
hA	Specifies a handle to the data for A in GPU memory.
hB	Specifies a handle to the data for B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type double
hC	Specifies a handle to the data for C in GPU memory.

Definition at line 6366 of file CudaDnn.cs.

◆ geam() [2/3]

void MyCaffe.common.CudaDnn< T >.geam	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		float	fAlpha,
		long	hA,
		long	hB,
		float	fBeta,
		long	hC
	)

Perform a matrix-matrix addition/transposition operation: C = alpha transA (A) + beta transB (B)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A, B and C.
n	Specifies the height (number of rows) of A, B and C.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type double
hA	Specifies a handle to the data for A in GPU memory.
hB	Specifies a handle to the data for B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type double
hC	Specifies a handle to the data for C in GPU memory.

Definition at line 6386 of file CudaDnn.cs.

◆ geam() [3/3]

void MyCaffe.common.CudaDnn< T >.geam	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		T	fAlpha,
		long	hA,
		long	hB,
		T	fBeta,
		long	hC,
		int	nAOffset = `0`,
		int	nBOffset = `0`,
		int	nCOffset = `0`
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type 'T'.
hA	Specifies a handle to the data for matrix A in GPU memory.
hB	Specifies a handle to the data for matrix B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type 'T'.
hC	Specifies a handle to the data for matrix C in GPU memory.
nAOffset	Specifies an offset (in items, not bytes) into the memory of A.
nBOffset	Specifies an offset (in items, not bytes) into the memory of B.
nCOffset	Specifies an offset (in items, not bytes) into the memory of C.

Definition at line 6409 of file CudaDnn.cs.

◆ gelu_bwd()

void MyCaffe.common.CudaDnn< T >.gelu_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData,
		bool	bEnableBertVersion
	)

Performs a GELU backward pass in Cuda.

Computes the GELU gradient. When bEnableBertVersion=false (default) Computes the GELU non-linearity $ y = cdf + x * pdf $ where $ cdf = 0.5 * (1.0 + erf(x / sqrt(2.0))) $ $ pdf = 1.0 / sqrt(2.0 * PI) * exp(-0.5 * x^2) $

with $ y' = cdf + x * pdf $

See also: On the GELU Activation Function

When bEnableBertVersion=true, $ y' = 0.5 * tanh(0.797885 * (x + 0.044715 * x^3)) + (0.0535161 * x^3 + 0.398942 * x) * sech^2(0.797885 * (x + 0.044715 * x^3)) + 0.5 $ Note, see Wolfram Alpha with 'derivative of d/dx = 0.5 * x * (1.0 + tanh(sqrt(2.0/PI) * (x + 0.044715 * x^3)))'

See also: Github - Karpathy: NewGELU, line 21 by Karpathy, 2022.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.
bEnableBertVersion	Specifies to use the BERT version, or default version.

Definition at line 9098 of file CudaDnn.cs.

◆ gelu_fwd()

void MyCaffe.common.CudaDnn< T >.gelu_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		bool	bEnableBertVersion
	)

Performs a GELU forward pass in Cuda.

When bEnableBertVersion=false (default) Computes the GELU non-linearity $ y = cdf + x * pdf $ where $ cdf = 0.5 * (1.0 + erf(x / sqrt(2.0))) $ $ pdf = 1.0 / sqrt(2.0 * PI) * exp(-0.5 * x^2) $

with $ y' = cdf + x * pdf $

See also: On the GELU Activation Function

When bEnableBertVersion=True Computes the GELU non-linearity $ f(x) =y = 0.5 * (1.0 + tanh(sqrt(2.0/PI) * (x + 0.044715 * x^3))) $ .

See also: Github - Karpathy: NewGELU, line 21 by Karpathy, 2022.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
bEnableBertVersion	Specifies to use the BERT version or the default version.

Definition at line 9064 of file CudaDnn.cs.

◆ gemm() [1/5]

void MyCaffe.common.CudaDnn< T >.gemm	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		int	k,
		double	fAlpha,
		long	hA,
		long	hB,
		double	fBeta,
		long	hC
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
k	Specifies the width (number of columns) of A and B.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type double
hA	Specifies a handle to the data for A in GPU memory.
hB	Specifies a handle to the data for B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type double
hC	Specifies a handle to the data for C in GPU memory.

Definition at line 6236 of file CudaDnn.cs.

◆ gemm() [2/5]

void MyCaffe.common.CudaDnn< T >.gemm	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		int	k,
		double	fAlpha,
		long	hA,
		long	hB,
		double	fBeta,
		long	hC,
		uint	lda,
		uint	ldb,
		uint	ldc
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
k	Specifies the width (number of columns) of A and B.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type 'T'.
hA	Specifies a handle to the data for matrix A in GPU memory.
hB	Specifies a handle to the data for matrix B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type 'T'.
hC	Specifies a handle to the data for matrix C in GPU memory.
lda	Specifies the leading dimension of A.
ldb	Specifies the leading dimension of B.
ldc	Specifies the leading dimension of C.

Definition at line 6312 of file CudaDnn.cs.

◆ gemm() [3/5]

void MyCaffe.common.CudaDnn< T >.gemm	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		int	k,
		double	fAlpha,
		long	hA,
		long	hB,
		double	fBeta,
		long	hC,
		uint	lda,
		uint	ldb,
		uint	ldc,
		uint	stridea,
		uint	strideb,
		uint	stridec,
		uint	batch_count
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
k	Specifies the width (number of columns) of A and B.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type 'T'.
hA	Specifies a handle to the data for matrix A in GPU memory.
hB	Specifies a handle to the data for matrix B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type 'T'.
hC	Specifies a handle to the data for matrix C in GPU memory.
lda	Specifies the leading dimension of A.
ldb	Specifies the leading dimension of B.
ldc	Specifies the leading dimension of C.
stridea	Specifies the stride of matrix A
strideb	Specifies the stride of matrix B
stridec	Specifies the stride of matrix C
batch_count	Specifies the number of matricies.

Definition at line 6343 of file CudaDnn.cs.

◆ gemm() [4/5]

void MyCaffe.common.CudaDnn< T >.gemm	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		int	k,
		float	fAlpha,
		long	hA,
		long	hB,
		float	fBeta,
		long	hC
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
k	Specifies the width (number of columns) of A and B.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type float
hA	Specifies a handle to the data for matrix A in GPU memory.
hB	Specifies a handle to the data for matrix B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type float
hC	Specifies a handle to the data for matrix C in GPU memory.

Definition at line 6257 of file CudaDnn.cs.

◆ gemm() [5/5]

void MyCaffe.common.CudaDnn< T >.gemm	(	bool	bTransA,
		bool	bTransB,
		int	m,
		int	n,
		int	k,
		T	fAlpha,
		long	hA,
		long	hB,
		T	fBeta,
		long	hC,
		int	nAOffset = `0`,
		int	nBOffset = `0`,
		int	nCOffset = `0`,
		int	nGroups = `1`,
		int	nGroupOffsetA = `0`,
		int	nGroupOffsetB = `0`,
		int	nGroupOffsetC = `0`
	)

Perform a matrix-matrix multiplication operation: C = alpha transB (B) transA (A) + beta C

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
bTransB	Specifies whether or not to transpose B.
m	Specifies the width (number of columns) of A and C.
n	Specifies the height (number of rows) of B and C.
k	Specifies the width (number of columns) of A and B.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type 'T'.
hA	Specifies a handle to the data for matrix A in GPU memory.
hB	Specifies a handle to the data for matrix B in GPU memory.
fBeta	Specifies a scalar multiplied by C where the scalar is of type 'T'.
hC	Specifies a handle to the data for matrix C in GPU memory.
nAOffset	Specifies an offset (in items, not bytes) into the memory of A.
nBOffset	Specifies an offset (in items, not bytes) into the memory of B.
nCOffset	Specifies an offset (in items, not bytes) into the memory of C.
nGroups	Optionally, specifies the number of groups (default = 1).
nGroupOffsetA	Optionally, specifies an offset multiplied by the current group 'g' and added to the AOffset (default = 0).
nGroupOffsetB	Optionally, specifies an offset multiplied by the current group 'g' and added to the BOffset (default = 0).
nGroupOffsetC	Optionally, specifies an offset multiplied by the current group 'g' and added to the COffset (default = 0).

Definition at line 6285 of file CudaDnn.cs.

◆ gemv() [1/3]

void MyCaffe.common.CudaDnn< T >.gemv	(	bool	bTransA,
		int	m,
		int	n,
		double	fAlpha,
		long	hA,
		long	hX,
		double	fBeta,
		long	hY
	)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
m	Specifies the width (number of columns) of A.
n	Specifies the height (number of rows) of A.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type double
hA	Specifies a handle to the data for matrix A in GPU memory.
hX	Specifies a handle to the data for vector x in GPU memory.
fBeta	Specifies a scalar multiplied by y where the scalar is of type double
hY	Specifies a handle to the data for vectory y in GPU memory.

Definition at line 6431 of file CudaDnn.cs.

◆ gemv() [2/3]

void MyCaffe.common.CudaDnn< T >.gemv	(	bool	bTransA,
		int	m,
		int	n,
		float	fAlpha,
		long	hA,
		long	hX,
		float	fBeta,
		long	hY
	)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
m	Specifies the width (number of columns) of A.
n	Specifies the height (number of rows) of A.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type float
hA	Specifies a handle to the data for matrix A in GPU memory.
hX	Specifies a handle to the data for vector x in GPU memory.
fBeta	Specifies a scalar multiplied by y where the scalar is of type float
hY	Specifies a handle to the data for vectory y in GPU memory.

Definition at line 6450 of file CudaDnn.cs.

◆ gemv() [3/3]

void MyCaffe.common.CudaDnn< T >.gemv	(	bool	bTransA,
		int	m,
		int	n,
		T	fAlpha,
		long	hA,
		long	hX,
		T	fBeta,
		long	hY,
		int	nAOffset = `0`,
		int	nXOffset = `0`,
		int	nYOffset = `0`
	)

Perform a matrix-vector multiplication operation: y = alpha transA (A) x + beta y (where x and y are vectors)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

bTransA	Specifies whether or not to transpose A.
m	Specifies the width (number of columns) of A.
n	Specifies the height (number of rows) of A.
fAlpha	Specifies a scalar multiplied by the data where the scalar is of type 'T'.
hA	Specifies a handle to the data for matrix A in GPU memory.
hX	Specifies a handle to the data for vector X in GPU memory.
fBeta	Specifies a scalar multiplied by Y where the scalar is of type 'T'
hY	Specifies a handle to the data for vectory y in GPU memory.
nAOffset	Specifies an offset (in items, not bytes) into the memory of A.
nXOffset	Specifies an offset (in items, not bytes) into the memory of X.
nYOffset	Specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6472 of file CudaDnn.cs.

◆ ger() [1/3]

void MyCaffe.common.CudaDnn< T >.ger	(	int	m,
		int	n,
		double	fAlpha,
		long	hX,
		long	hY,
		long	hA
	)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

m	Specifies the length of X and rows in A (m x n).
n	Specifies the length of Y and cols in A (m x n).
fAlpha	Specifies a scalar multiplied by y where the scalar is of type 'T'.
hX	Specifies a handle to the data for matrix X (m in length) in GPU memory.
hY	Specifies a handle to the data for vector Y (n in length) in GPU memory.
hA	Specifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6492 of file CudaDnn.cs.

◆ ger() [2/3]

void MyCaffe.common.CudaDnn< T >.ger	(	int	m,
		int	n,
		float	fAlpha,
		long	hX,
		long	hY,
		long	hA
	)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

m	Specifies the length of X and rows in A (m x n).
n	Specifies the length of Y and cols in A (m x n).
fAlpha	Specifies a scalar multiplied by y where the scalar is of type 'T'.
hX	Specifies a handle to the data for matrix X (m in length) in GPU memory.
hY	Specifies a handle to the data for vector Y (n in length) in GPU memory.
hA	Specifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6509 of file CudaDnn.cs.

◆ ger() [3/3]

void MyCaffe.common.CudaDnn< T >.ger	(	int	m,
		int	n,
		T	fAlpha,
		long	hX,
		long	hY,
		long	hA
	)

Perform a vector-vector multiplication operation: A = x * (fAlpha * y) (where x and y are vectors and A is an m x n Matrix)

This function uses NVIDIA's cuBlas but with a different parameter ordering.

Parameters

m	Specifies the length of X and rows in A (m x n).
n	Specifies the length of Y and cols in A (m x n).
fAlpha	Specifies a scalar multiplied by y where the scalar is of type 'T'.
hX	Specifies a handle to the data for matrix X (m in length) in GPU memory.
hY	Specifies a handle to the data for vector Y (n in length) in GPU memory.
hA	Specifies a handle to the data for matrix A (m x n) in GPU memory.

Definition at line 6526 of file CudaDnn.cs.

◆ get()

T[] MyCaffe.common.CudaDnn< T >.get	(	int	nCount,
		long	hHandle,
		int	nIdx = `-1`
	)

Queries the GPU memory by copying it into an array of type 'T'.

Parameters

nCount	Specifies the number of items.
hHandle	Specifies a handle to GPU memory.
nIdx	When -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.

Returns: An array of
T

is returned.

Definition at line 5985 of file CudaDnn.cs.

◆ get_double()

double[] MyCaffe.common.CudaDnn< T >.get_double	(	int	nCount,
		long	hHandle,
		int	nIdx = `-1`
	)

Queries the GPU memory by copying it into an array of

double

Parameters

nCount	Specifies the number of items.
hHandle	Specifies a handle to GPU memory.
nIdx	When -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.

Returns: An array of
double

is returned.

Definition at line 5961 of file CudaDnn.cs.

◆ get_float()

float[] MyCaffe.common.CudaDnn< T >.get_float	(	int	nCount,
		long	hHandle,
		int	nIdx = `-1`
	)

Queries the GPU memory by copying it into an array of

float

Parameters

nCount	Specifies the number of items.
hHandle	Specifies a handle to GPU memory.
nIdx	When -1, all values in the GPU memory are queried, otherwise, only the value at the index nIdx is returned.

Returns: An array of
float

is returned.

Definition at line 5973 of file CudaDnn.cs.

◆ GetConvolutionInfo()

void MyCaffe.common.CudaDnn< T >.GetConvolutionInfo	(	long	hCuDnn,
		long	hBottomDesc,
		long	hFilterDesc,
		long	hConvDesc,
		long	hTopDesc,
		ulong	lWorkspaceSizeLimitInBytes,
		bool	bUseTensorCores,
		out CONV_FWD_ALGO	algoFwd,
		out ulong	lWsSizeFwd,
		out CONV_BWD_FILTER_ALGO	algoBwdFilter,
		out ulong	lWsSizeBwdFilter,
		out CONV_BWD_DATA_ALGO	algoBwdData,
		out ulong	lWsSizeBwdData,
		CONV_FWD_ALGO	preferredFwdAlgo = `CONV_FWD_ALGO.NONE`
	)

Queryies the algorithms and workspace sizes used for a given convolution descriptor.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hFilterDesc	Specifies a handle to the filter descriptor.
hConvDesc	Specifies a handle to the convolution descriptor.
hTopDesc	Specifies a handle to the top tensor descriptor.
lWorkspaceSizeLimitInBytes	Specifies the workspace limits (in bytes).
bUseTensorCores	Specifies whether or not to use tensor cores (this parameter must match the setting of the 'bUseTensorCores' specified in the 'SetConvolutionDesc' method.
algoFwd	Returns the algorithm used for the convolution foward.
lWsSizeFwd	Returns the workspace size (in bytes) for the convolution foward.
algoBwdFilter	Returns the algorithm used for the backward filter.
lWsSizeBwdFilter	Returns the workspace size (int bytes) for the backward filter.
algoBwdData	Returns the algorithm for the backward data.
lWsSizeBwdData	Returns the workspace (in bytes) for the backward data.
preferredFwdAlgo	Optionally, specifies a preferred forward algo to attempt to use for forward convolution. The new algo is only used if the current device supports it.

Definition at line 3810 of file CudaDnn.cs.

◆ GetCudaDnnDllPath()

static string MyCaffe.common.CudaDnn< T >.GetCudaDnnDllPath ( )

static

Returns the path to the CudaDnnDll module to use for low level CUDA processing.

Returns: The CudaDnnDll path is returned.

Definition at line 1638 of file CudaDnn.cs.

◆ GetDeviceCount()

int MyCaffe.common.CudaDnn< T >.GetDeviceCount ( )

Query the number of devices (gpu's) installed.

Returns: The number of GPU's is returned.

Definition at line 2127 of file CudaDnn.cs.

◆ GetDeviceID()

int MyCaffe.common.CudaDnn< T >.GetDeviceID ( )

Returns the current device id set within Cuda.

Returns: The device id.

Definition at line 2013 of file CudaDnn.cs.

◆ GetDeviceInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceInfo	(	int	nDeviceID,
		bool	bVerbose = `false`
	)

Query the device information of a device.

Parameters

nDeviceID	Specifies the device id.
bVerbose	When true, more detailed information is returned.

Returns

Definition at line 2064 of file CudaDnn.cs.

◆ GetDeviceMemory()

double MyCaffe.common.CudaDnn< T >.GetDeviceMemory	(	out double	dfFree,
		out double	dfUsed,
		out bool	bCudaCallUsed,
		int	nDeviceID = `-1`
	)

Queries the amount of total, free and used memory on a given GPU.

Parameters

dfFree	Specifies the amount of free memory in GB.
dfUsed	Specifies the amount of used memory in GB.
bCudaCallUsed	Specifies whether or not the used memory is an estimate calculated using the Low-Level Cuda DNN Dll handle table.
nDeviceID	Specifies the specific device id to query, or if -1, uses calculates an estimate of the memory used using the current low-level Cuda DNN Dll handle table.

Returns: The device's total amount of memory in GB is returned.

Definition at line 2182 of file CudaDnn.cs.

◆ GetDeviceName()

string MyCaffe.common.CudaDnn< T >.GetDeviceName ( int nDeviceID )

Query the name of a device.

Parameters

nDeviceID Specifies the device id.

Returns: The name of the GPU at the device id is returned.

Definition at line 2035 of file CudaDnn.cs.

◆ GetDeviceP2PInfo()

string MyCaffe.common.CudaDnn< T >.GetDeviceP2PInfo ( int nDeviceID )

Query the peer-to-peer information of a device.

Parameters

nDeviceID Specifies the device id.

Returns: The peer-to-per information of the GPU at the device id is returned.

Definition at line 2049 of file CudaDnn.cs.

◆ GetDropoutInfo()

void MyCaffe.common.CudaDnn< T >.GetDropoutInfo	(	long	hCuDnn,
		long	hBottomDesc,
		out ulong	ulStateCount,
		out ulong	ulReservedCount
	)

Query the dropout state and reserved counts.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
ulStateCount	Returns the state count.
ulReservedCount	Returns the reserved count.

Definition at line 4252 of file CudaDnn.cs.

◆ GetHostBufferCapacity()

long MyCaffe.common.CudaDnn< T >.GetHostBufferCapacity ( long hMem )

Returns the host memory capacity.

Parameters

hMem	Specfies the host memory.

Returns: The current host memory capacity is returned.

Definition at line 2621 of file CudaDnn.cs.

◆ GetHostMemory()

T[] MyCaffe.common.CudaDnn< T >.GetHostMemory ( long hMem )

Retrieves the host memory as an array of type 'T'

Parameters

hMem	Specifies the handle to the host memory.

Returns: An array of type 'T' is returned.

Definition at line 2662 of file CudaDnn.cs.

◆ GetHostMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetHostMemoryDouble ( long hMem )

Retrieves the host memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the host memory.

Returns: An array of doubles is returned.

Definition at line 2641 of file CudaDnn.cs.

◆ GetHostMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetHostMemoryFloat ( long hMem )

Retrieves the host memory as an array of floats.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the host memory.

Returns: An array of floats is returned.

Definition at line 2652 of file CudaDnn.cs.

◆ GetMemory()

T[] MyCaffe.common.CudaDnn< T >.GetMemory	(	long	hMem,
		long	lCount = `-1`
	)

Retrieves the GPU memory as an array of type 'T'

Parameters

hMem	Specifies the handle to the GPU memory.
lCount	Optionally, specifies a count of items to retrieve.

Returns: An array of type 'T' is returned.

Definition at line 2700 of file CudaDnn.cs.

◆ GetMemoryDouble()

double[] MyCaffe.common.CudaDnn< T >.GetMemoryDouble	(	long	hMem,
		long	lCount = `-1`
	)

Retrieves the GPU memory as an array of doubles.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
lCount	Optionally, specifies a count of items to retrieve.

Returns: An array of double is returned.

Definition at line 2677 of file CudaDnn.cs.

◆ GetMemoryFloat()

float[] MyCaffe.common.CudaDnn< T >.GetMemoryFloat	(	long	hMem,
		long	lCount = `-1`
	)

Retrieves the GPU memory as an array of float.

This function converts the output array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
lCount	Optionally, specifies a count of items to retrieve.

Returns: An array of float is returned.

Definition at line 2689 of file CudaDnn.cs.

◆ GetMultiGpuBoardGroupID()

int MyCaffe.common.CudaDnn< T >.GetMultiGpuBoardGroupID ( int nDeviceID )

Query the mutli-gpu board group id for a device.

Parameters

nDeviceID Specifies the device id.

Returns: The mutli-gpu board group id is returned.

Definition at line 2109 of file CudaDnn.cs.

◆ GetRequiredCompute()

string MyCaffe.common.CudaDnn< T >.GetRequiredCompute	(	out int	nMinMajor,
		out int	nMinMinor
	)

The GetRequiredCompute function returns the Major and Minor compute values required by the current CudaDNN DLL used.

Parameters

nMinMajor	Specifies the minimum required major compute value.
nMinMinor	Specifies the minimum required minor compute value.

Together the Major.Minor compute values define the minimum required compute for the CudaDNN DLL used.

Returns: The path to the CudaDNN dll in use is returned.

Definition at line 2216 of file CudaDnn.cs.

◆ GetRnn8MemorySizes()

void MyCaffe.common.CudaDnn< T >.GetRnn8MemorySizes	(	long	hCuDnn,
		long	hRnn,
		out ulong	szWtCount,
		out ulong	szWorkSize,
		out ulong	szReservedSize
	)

Returns the memory sizes required for the RNN8.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnn	Specifies the handle to the RNN8 created with CreateRnn8.
szWtCount	Returns the required weight count (in items).
szWorkSize	Returns the rquired work size (in bytes).
szReservedSize	Returns the required reserved size (in bytes).

Definition at line 5221 of file CudaDnn.cs.

◆ GetRnnLinLayerParams()

void MyCaffe.common.CudaDnn< T >.GetRnnLinLayerParams	(	long	hCuDnn,
		long	hRnnDesc,
		int	nLayer,
		long	hXDesc,
		long	hWtDesc,
		long	hWtData,
		int	nLinLayer,
		out int	nWtCount,
		out long	hWt,
		out int	nBiasCount,
		out long	hBias
	)

Returns the linear layer parameters (weights).

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
nLayer	Specifies the current layer index.
hXDesc	Specifies the input data elelement descriptor.
hWtDesc	Specifies the weight descriptor.
hWtData	Specifies the weight memory containing all weights.
nLinLayer	Specifies the linear layer index (e.g. LSTM has 8 linear layers, RNN has 2)
nWtCount	Returns the number of weight items.
hWt	Returns a handle to the weight GPU memory.
nBiasCount	Returns the number of bias items.
hBias	Returns a handle to the bias GPU memory.

Definition at line 4837 of file CudaDnn.cs.

◆ GetRnnParamCount()

int MyCaffe.common.CudaDnn< T >.GetRnnParamCount	(	long	hCuDnn,
		long	hRnnDesc,
		long	hXDesc
	)

Returns the RNN parameter count.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
hXDesc	Specifies the handle to the first X descriptor.

Returns: The number of parameters (weights) is returned.

Definition at line 4785 of file CudaDnn.cs.

◆ GetRnnWorkspaceCount()

ulong MyCaffe.common.CudaDnn< T >.GetRnnWorkspaceCount	(	long	hCuDnn,
		long	hRnnDesc,
		long	hXDesc,
		out ulong	nReservedCount
	)

Returns the workspace and reserved counts.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
hXDesc	Specifies a handle to the data descriptor created with CreateRnnDataDesc.
nReservedCount	Returns the reserved count needed.

Returns: Returns the workspace count needed.

Definition at line 4807 of file CudaDnn.cs.

◆ hamming_distance()

double MyCaffe.common.CudaDnn< T >.hamming_distance	(	int	n,
		double	dfThreshold,
		long	hA,
		long	hB,
		long	hY,
		int	nOffA = `0`,
		int	nOffB = `0`,
		int	nOffY = `0`
	)

The hamming_distance calculates the Hamming Distance between X and Y both of length n.

To calculate the hamming distance first, X and Y are bitified where each element is converted to 1 if > than the threshold, or 0 otherwise. Next, the bitified versions of X and Y are subtracted from one another, and the Asum of the result is returned, which is the number of bits that are different, thus the Hamming distance.

Parameters

n	Specifies the number of elements to compare in both X and Y.
dfThreshold	Specifies the threshold used to 'bitify' both X and Y
hA	Specifies the handle to the GPU memory containing the first vector to compare.
hB	Specifies the handle to the GPU memory containing the second vector to compare.
hY	Specifies the handle to the GPU memory where the hamming difference (bitified A - bitified B) is placed.
nOffA	Optionally, specifies an offset into the GPU memory of A, the default is 0.
nOffB	Optionally, specifies an offset into the GPU memory of B, the default is 0.
nOffY	Optionally, specifies an offset into the GPU memory of Y, the default is 0.

Returns: The hamming distance is returned.

Definition at line 11005 of file CudaDnn.cs.

◆ im2col()

void MyCaffe.common.CudaDnn< T >.im2col	(	long	hDataIm,
		int	nDataImOffset,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nKernelH,
		int	nKernelW,
		int	nPadH,
		int	nPadW,
		int	nStrideH,
		int	nStrideW,
		int	nDilationH,
		int	nDilationW,
		long	hDataCol,
		int	nDataColOffset
	)

Rearranges image blocks into columns.

Parameters

hDataIm	Specifies a handle to the image block in GPU memory.
nDataImOffset	Specifies an offset into the image block memory.
nChannels	Specifies the number of channels in the image.
nHeight	Specifies the height of the image.
nWidth	Specifies the width of the image.
nKernelH	Specifies the kernel height.
nKernelW	Specifies the kernel width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nDilationH	Specifies the dilation along the height.
nDilationW	Specifies the dilation along the width.
hDataCol	Specifies a handle to the column data in GPU memory.
nDataColOffset	Specifies an offset into the column memory.

Definition at line 7989 of file CudaDnn.cs.

◆ im2col_nd()

void MyCaffe.common.CudaDnn< T >.im2col_nd	(	long	hDataIm,
		int	nDataImOffset,
		int	nNumSpatialAxes,
		int	nImCount,
		int	nChannelAxis,
		long	hImShape,
		long	hColShape,
		long	hKernelShape,
		long	hPad,
		long	hStride,
		long	hDilation,
		long	hDataCol,
		int	nDataColOffset
	)

Rearranges image blocks into columns.

Parameters

hDataIm	Specifies a handle to the image block in GPU memory.
nDataImOffset	Specifies an offset into the image block memory.
nNumSpatialAxes	Specifies the number of spatial axes.
nImCount	Specifies the number of kernels.
nChannelAxis	Specifies the axis containing the channel.
hImShape	Specifies a handle to the image shape data in GPU memory.
hColShape	Specifies a handle to the column shape data in GPU memory.
hKernelShape	Specifies a handle to the kernel shape data in GPU memory.
hPad	Specifies a handle to the pad data in GPU memory.
hStride	Specifies a handle to the stride data in GPU memory.
hDilation	Specifies a handle to the dilation data in GPU memory.
hDataCol	Specifies a handle to the column data in GPU memory.
nDataColOffset	Specifies an offset into the column memory.

Definition at line 8013 of file CudaDnn.cs.

◆ InitializeRnn8Weights()

void MyCaffe.common.CudaDnn< T >.InitializeRnn8Weights	(	long	hCuDnn,
		long	hRnn,
		long	hWt,
		RNN_FILLER_TYPE	wtFt,
		double	fWtVal,
		double	fWtVal2,
		RNN_FILLER_TYPE	biasFt,
		double	fBiasVal,
		double	fBiasVal2
	)

Initialize the RNN8 weights

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnn	Specifies the handle to the RNN8 created with CreateRnn8.
hWt	Specifies the handle to the GPU data containing the weights to be initialized.
wtFt	Specifies the weight filler type.
fWtVal	Specifies the weight filler value.
fWtVal2	Specifies a secondary weight filler value.
biasFt	Specifies the bias filler type.
fBiasVal	Specifies the bias filler value.
fBiasVal2	Specifies a secondary bias filler value.

Definition at line 5251 of file CudaDnn.cs.

◆ interp2()

void MyCaffe.common.CudaDnn< T >.interp2	(	int	nChannels,
		long	hData1,
		int	nX1,
		int	nY1,
		int	nHeight1,
		int	nWidth1,
		int	nHeight1A,
		int	nWidth1A,
		long	hData2,
		int	nX2,
		int	nY2,
		int	nHeight2,
		int	nWidth2,
		int	nHeight2A,
		int	nWidth2A,
		bool	bBwd = `false`
	)

Interpolates between two sizes within the spatial dimensions.

Parameters

nChannels	Specifies the channels (usually num * channels)
hData1	Specifies the input data when bBwd=false and the output data when bBwd=true.
nX1	Specifies the offset along the x axis for data1.
nY1	Specifies the offset along the y axis for data1.
nHeight1	Specifies the effective height for data1.
nWidth1	Specifies the effective width for data1.
nHeight1A	Specifies the input height for data1.
nWidth1A	Specifies the input width for data1.
hData2	Specifies the output data when bBwd=false and the input data when bBwd=true.
nX2	Specifies the offset along the x axis for data2.
nY2	Specifies the offset along the y axis for data2.
nHeight2	Specifies the effective height for data2.
nWidth2	Specifies the effective width for data2.
nHeight2A	Specifies the output height for data2.
nWidth2A	Specifies the output width for data2.
bBwd	Optionally, specifies to perform the backward operation from data2 to data1, otherwise the operation performs on data1 to data2. (default = false).

Definition at line 7138 of file CudaDnn.cs.

◆ IsRnn8Supported()

bool MyCaffe.common.CudaDnn< T >.IsRnn8Supported ( )

Returns whether or not RNN8 is supported.

Definition at line 5142 of file CudaDnn.cs.

◆ KernelAdd()

void MyCaffe.common.CudaDnn< T >.KernelAdd	(	int	nCount,
		long	hA,
		long	hDstKernel,
		long	hB,
		long	hC
	)

Add memory from one kernel to memory residing on another kernel.

Parameters

nCount	Specifies the number of items within both A and B.
hA	Specifies the handle to the memory A.
hDstKernel	Specifies the kernel where the memory B and the desitnation memory C reside.
hB	Specifies the handle to the memory B (for which A will be added).
hC	Specifies the destination data where A+B will be placed.

Definition at line 1848 of file CudaDnn.cs.

◆ KernelCopy()

void MyCaffe.common.CudaDnn< T >.KernelCopy	(	int	nCount,
		long	hSrc,
		int	nSrcOffset,
		long	hDstKernel,
		long	hDst,
		int	nDstOffset,
		long	hHostBuffer,
		long	hHostKernel = `-1`,
		long	hStream = `-1`,
		long	hSrcKernel = `-1`
	)

Copy memory from the look-up tables in one kernel to another.

Parameters

nCount	Specifies the number of items to copy.
hSrc	Specifies the handle to the source memory.
nSrcOffset	Specifies the offset (in items, not bytes) from which to start the copy in the source memory.
hDstKernel	Specifies the destination kernel holding the look-up table and memory where the data is to be copied.
hDst	Specifies the handle to the destination memory where the data is to be copied.
nDstOffset	Specifies the offset (in items, not bytes) where the copy to to be placed within the destination data.
hHostBuffer	Specifies the handle to the host buffer to be used when transfering the data from one kernel to another.
hHostKernel	Optionally, specifies the handle to the kernel holding the look-up table for the host buffer.
hStream	Optionally, specifies the handle to the CUDA stream to use for the transfer.
hSrcKernel	Optionally, specifies the handle to the source kernel.

Definition at line 1829 of file CudaDnn.cs.

◆ KernelCopyNccl()

long MyCaffe.common.CudaDnn< T >.KernelCopyNccl	(	long	hSrcKernel,
		long	hSrcNccl
	)

Copies an Nccl handle from one kernel to the current kernel of the current CudaDnn instance.

Nccl handles are created on the main Kernel, but when used must transferred to the destination kernel (running on a different thread) where the secondary Nccl handle is used.

Parameters

hSrcKernel	Specifies the source kernel (typically where the Nccl handle was created).
hSrcNccl	Specifies the source Nccl handle to be copied.

Returns

Definition at line 1866 of file CudaDnn.cs.

◆ LayerNormBackward()

void MyCaffe.common.CudaDnn< T >.LayerNormBackward	(	long	hLayerNorm,
		long	hYdata,
		long	hYdiff,
		long	hXdiff
	)

Run the LayerNorm backward pass.

Parameters

hLayerNorm	Specifies the handle to the LayerNorm instance.
hYdata	Specifies the normalized output data.
hYdiff	Specifies the input diff to be un-normalized.
hXdiff	Specifies the un-normalized output diff.

Definition at line 5875 of file CudaDnn.cs.

◆ LayerNormForward()

void MyCaffe.common.CudaDnn< T >.LayerNormForward	(	long	hLayerNorm,
		long	hXdata,
		long	hYdata
	)

Run the LayerNorm forward pass.

Parameters

hLayerNorm	Specifies the handle to the LayerNorm instance.
hXdata	Specifies the input data to be normalized.
hYdata	Specifies the normalized output data.

Definition at line 5860 of file CudaDnn.cs.

◆ lecun_bwd()

void MyCaffe.common.CudaDnn< T >.lecun_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData
	)

Performs the LeCun's Tanh function backward

Computes the LeCun non-linearity $ y = 1.7159 * tanh(2/3 * x) $ $ y' = 1.7159 * 2/3 * (1 - tanh(2/3 * x)^2) $

See also: Lecun's Tanh by PapersWithCode.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.

Definition at line 9225 of file CudaDnn.cs.

◆ lecun_fwd()

void MyCaffe.common.CudaDnn< T >.lecun_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs the LeCun's Tanh function forward

Computes the LeCun non-linearity $ y = 1.7159 * tanh(2/3 * x) $ $ y' = 1.7159 * 2/3 * (1 - tanh(2/3 * x)^2) $

See also: Lecun's Tanh by PapersWithCode.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9203 of file CudaDnn.cs.

◆ log() [1/2]

void MyCaffe.common.CudaDnn< T >.log	(	int	n,
		long	hA,
		long	hY
	)

Calculates the log value of A and places the result in Y.

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7488 of file CudaDnn.cs.

◆ log() [2/2]

void MyCaffe.common.CudaDnn< T >.log	(	int	n,
		long	hA,
		long	hY,
		double	dfBeta,
		double	dfAlpha = `0`
	)

Calculates the log value of (A * beta) + alpha, and places the result in Y.

$f(x) = \ln((x * \beta) + \alpha)$

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
dfBeta	Specifies the scalar as type double that is multiplied with the log.
dfAlpha	Optionally, specifies a scalar added to the value before taking the log.

Definition at line 7504 of file CudaDnn.cs.

◆ lrn_computediff()

void MyCaffe.common.CudaDnn< T >.lrn_computediff	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		long	hScaleData,
		long	hTopDiff,
		int	nNum,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nSize,
		T	fNegativeBeta,
		T	fCacheRatio,
		long	hBottomDiff
	)

Computes the diff used to calculate the LRN cross channel backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hScaleData	Specifies a handle to the scale data in GPU memory.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nNum	Specifies the number of input items.
nChannels	Specifies the number of channels per input item.
nHeight	Specifies the height of each input item.
nWidth	Specifies the width of each input item.
nSize	*NEEDS REVIEW*
fNegativeBeta	Specifies the negative beta value.
fCacheRatio	*NEEDS REVIEW*
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 10184 of file CudaDnn.cs.

◆ lrn_computeoutput()

void MyCaffe.common.CudaDnn< T >.lrn_computeoutput	(	int	nCount,
		long	hBottomData,
		long	hScaleData,
		T	fNegativeBeta,
		long	hTopData
	)

Computes the output used to calculate the LRN cross channel forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
hScaleData	Specifies a handle to the scale data in GPU memory.
fNegativeBeta	Specifies the negative beta value.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 10159 of file CudaDnn.cs.

◆ lrn_fillscale()

void MyCaffe.common.CudaDnn< T >.lrn_fillscale	(	int	nCount,
		long	hBottomData,
		int	nNum,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nSize,
		T	fAlphaOverSize,
		T	fK,
		long	hScaleData
	)

Performs the fill scale operation used to calculate the LRN cross channel forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
nNum	Specifies the number of input items.
nChannels	Specifies the number of channels per input item.
nHeight	Specifies the height of each input item.
nWidth	Specifies the width of each input item.
nSize	*NEEDS REVIEW*
fAlphaOverSize	Specifies the alpha value over the size.
fK	Specifies the k value.
hScaleData	Specifies a handle to the scale data in GPU memory.

Definition at line 10143 of file CudaDnn.cs.

◆ LRNCrossChannelBackward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelBackward	(	long	hCuDnn,
		long	hNormDesc,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform LRN cross channel backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hNormDesc	Specifies a handle to an LRN descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4384 of file CudaDnn.cs.

◆ LRNCrossChannelForward()

void MyCaffe.common.CudaDnn< T >.LRNCrossChannelForward	(	long	hCuDnn,
		long	hNormDesc,
		T	fAlpha,
		long	hBottomDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDesc,
		long	hTopData
	)

Perform LRN cross channel forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hNormDesc	Specifies a handle to an LRN descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4361 of file CudaDnn.cs.

◆ lstm_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_bwd	(	int	t,
		int	nN,
		int	nH,
		int	nI,
		double	dfClippingThreshold,
		long	hWeight_h,
		long	hClipData,
		int	nClipOffset,
		long	hTopDiff,
		int	nTopOffset,
		long	hCellData,
		long	hCellDiff,
		int	nCellOffset,
		long	hPreGateDiff,
		int	nPreGateOffset,
		long	hGateData,
		long	hGateDiff,
		int	nGateOffset,
		long	hCT1Data,
		int	nCT1Offset,
		long	hDHT1Diff,
		int	nDHT1Offset,
		long	hDCT1Diff,
		int	nDCT1Offset,
		long	hHtoHData,
		long	hContextDiff = `0`,
		long	hWeight_c = `0`
	)

Peforms the simple LSTM backward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters

t	Specifies the step within the sequence.
nN	Specifies the batch size.
nH	Specifies the number of hidden units.
nI	Specifies the number the input size.
dfClippingThreshold
hWeight_h	Specifies a handle to the GPU memory holding the 'h' weights.
hClipData	Specifies a handle to the GPU memory holding the clip data.
nClipOffset	Specifies the clip offset for this step within the sequence.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTopOffset	Specifies an offset into the top diff memory.
hCellData	Specifies a handle to the GPU memory holding the 'c_t' data.
hCellDiff	Specifies a handle to the GPU memory holding the 'c_t' gradients.
nCellOffset	Specifies the c_t offset for this step within the sequence.
hPreGateDiff	Specifies a handle to the GPU memory holding the pre-gate gradients.
nPreGateOffset	Specifies the pre-gate offset for this step within the sequence.
hGateData	Specifies a handle to the GPU memory holding the gate data.
hGateDiff	Specifies a handle to the GPU memory holding the gate gradients.
nGateOffset	Specifies the gate data offset for this step within the sequence.
hCT1Data	Specifies a handle to the GPU memory holding the CT1 data.
nCT1Offset	Specifies the CT1 offset for this step within the sequence.
hDHT1Diff	Specifies a handle to the GPU DHT1 gradients.
nDHT1Offset	Specifies the DHT1 offset for this step within the sequence.
hDCT1Diff	Specifies a handle to the DCT1 gradients.
nDCT1Offset	Specifies the DCT1 offset for this step within the sequence.
hHtoHData	Specifies a handle to the GPU memory holding the H to H data.
hContextDiff	Optionally, specifies the handle to the GPU memory holding the context diff, or 0 when not used.
hWeight_c	Optionally, specifies the handle to the GPU memory holding the 'c' weights, or 0 when not used.

Definition at line 10413 of file CudaDnn.cs.

◆ lstm_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_fwd	(	int	t,
		int	nN,
		int	nH,
		int	nI,
		long	hWeight_h,
		long	hWeight_i,
		long	hClipData,
		int	nClipOffset,
		long	hTopData,
		int	nTopOffset,
		long	hCellData,
		int	nCellOffset,
		long	hPreGateData,
		int	nPreGateOffset,
		long	hGateData,
		int	nGateOffset,
		long	hHT1Data,
		int	nHT1Offset,
		long	hCT1Data,
		int	nCT1Offset,
		long	hHtoGateData,
		long	hContext = `0`,
		long	hWeight_c = `0`,
		long	hCtoGetData = `0`
	)

Peforms the simple LSTM foward pass in Cuda.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters

t	Specifies the step within the sequence.
nN	Specifies the batch size.
nH	Specifies the number of hidden units.
nI	Specifies the number the input size.
hWeight_h	Specifies a handle to the GPU memory holding the 'h' weights.
hWeight_i	Specifies a handle to the GPU memory holding the 'i' weights.
hClipData	Specifies a handle to the GPU memory holding the clip data.
nClipOffset	Specifies the clip offset for this step within the sequence.
hTopData	Specifies a handle to the top data in GPU memory.
nTopOffset	Specifies an offset into the top data memory.
hCellData	Specifies a handle to the GPU memory holding the 'c_t' data.
nCellOffset	Specifies the c_t offset for this step within the sequence.
hPreGateData	Specifies a handle to the GPU memory holding the pre-gate data.
nPreGateOffset	Specifies the pre-gate offset for this step within the sequence.
hGateData	Specifies a handle to the GPU memory holding the gate data.
nGateOffset	Specifies the gate data offset for this step within the sequence.
hHT1Data	Specifies a handle to the GPU memory holding the HT1 data.
nHT1Offset	Specifies the HT1 offset for this step within the sequence.
hCT1Data	Specifies a handle to the GPU memory holding the CT1 data.
nCT1Offset	Specifies the CT1 offset for this step within the sequence.
hHtoGateData	Specifies a handle to the GPU memory holding the H to Gate data.
hContext	Optionally, specifies the attention context, or 0 when not used.
hWeight_c	Optionally, specifies the attention context weights, or 0 when not used.
hCtoGetData	Optionally, specifies the attention context to gate data, or 0 when not used.

Definition at line 10372 of file CudaDnn.cs.

◆ lstm_unit_bwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_bwd	(	int	nCount,
		int	nHiddenDim,
		int	nXCount,
		long	hC_prev,
		long	hX_acts,
		long	hC,
		long	hH,
		long	hCont,
		long	hC_diff,
		long	hH_diff,
		long	hC_prev_diff,
		long	hX_acts_diff,
		long	hX_diff
	)

Peforms the simple LSTM backward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters

nCount	*NEEDS REVIEW*
nHiddenDim	*NEEDS REVIEW*
nXCount	*NEEDS REVIEW*
hC_prev	*NEEDS REVIEW*
hX_acts	*NEEDS REVIEW*
hC	*NEEDS REVIEW*
hH	*NEEDS REVIEW*
hCont	*NEEDS REVIEW*
hC_diff	*NEEDS REVIEW*
hH_diff	*NEEDS REVIEW*
hC_prev_diff	*NEEDS REVIEW*
hX_acts_diff	*NEEDS REVIEW*
hX_diff	*NEEDS REVIEW*

Definition at line 10463 of file CudaDnn.cs.

◆ lstm_unit_fwd()

void MyCaffe.common.CudaDnn< T >.lstm_unit_fwd	(	int	nCount,
		int	nHiddenDim,
		int	nXCount,
		long	hX,
		long	hX_acts,
		long	hC_prev,
		long	hCont,
		long	hC,
		long	hH
	)

Peforms the simple LSTM foward pass in Cuda for a given LSTM unit.

See LSTM with Working Memory by Pulver, et al., 2016

Parameters

nCount	*NEEDS REVIEW*
nHiddenDim	*NEEDS REVIEW*
nXCount	*NEEDS REVIEW*
hX	*NEEDS REVIEW*
hX_acts	*NEEDS REVIEW*
hC_prev	*NEEDS REVIEW*
hCont	*NEEDS REVIEW*
hC	*NEEDS REVIEW*
hH	*NEEDS REVIEW*

Definition at line 10436 of file CudaDnn.cs.

◆ mask() [1/3]

void MyCaffe.common.CudaDnn< T >.mask	(	int	n,
		int	nMaskDim,
		double	fSearch,
		double	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7048 of file CudaDnn.cs.

◆ mask() [2/3]

void MyCaffe.common.CudaDnn< T >.mask	(	int	n,
		int	nMaskDim,
		float	fSearch,
		float	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7063 of file CudaDnn.cs.

◆ mask() [3/3]

void MyCaffe.common.CudaDnn< T >.mask	(	int	n,
		int	nMaskDim,
		T	fSearch,
		T	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7030 of file CudaDnn.cs.

◆ mask_batch() [1/3]

void MyCaffe.common.CudaDnn< T >.mask_batch	(	int	n,
		int	nBatch,
		int	nMaskDim,
		double	fSearch,
		double	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nBatch	Specifies the batch size.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7098 of file CudaDnn.cs.

◆ mask_batch() [2/3]

void MyCaffe.common.CudaDnn< T >.mask_batch	(	int	n,
		int	nBatch,
		int	nMaskDim,
		float	fSearch,
		float	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nBatch	Specifies the batch size.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7114 of file CudaDnn.cs.

◆ mask_batch() [3/3]

void MyCaffe.common.CudaDnn< T >.mask_batch	(	int	n,
		int	nBatch,
		int	nMaskDim,
		T	fSearch,
		T	fReplace,
		long	hX,
		long	hMask,
		long	hY
	)

Mask the mask the batch of data in the source with the mask by replacing all values 'fSearch' found in the mask with 'fReplace' in the destination.

Parameters

n	Specifies the number of items.
nBatch	Specifies the batch size.
nMaskDim	Specifies the number of items in the mask.
fSearch	Specifies the value within the mask to replace.
fReplace	Specifies the replacement value.
hX	Specifies a handle to the GPU memory of the source.
hMask	Specifies a handle to the GPU memory of the mask (containing the 'fSearch' values)
hY	Specifies a handle to the GPU memory of the destination.

Definition at line 7079 of file CudaDnn.cs.

◆ math_bwd()

void MyCaffe.common.CudaDnn< T >.math_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData,
		MATH_FUNCTION	function
	)

Performs a Math function backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.
function	Specifies the mathematical function to use.

Definition at line 8966 of file CudaDnn.cs.

◆ math_fwd()

void MyCaffe.common.CudaDnn< T >.math_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		MATH_FUNCTION	function
	)

Performs a Math function forward pass in Cuda.

Calculation

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
function	Specifies the mathematical function to use.

Definition at line 8949 of file CudaDnn.cs.

◆ matmul()

void MyCaffe.common.CudaDnn< T >.matmul	(	uint	nOuterCount,
		int	m,
		int	n,
		int	k,
		long	hA,
		long	hB,
		long	hC,
		double	dfScale = `1.0`,
		bool	bTransA = `false`,
		bool	bTransB = `false`
	)

Perform matmul operation hC = matmul(hA, hB), where hA, hB and hC are all in row-major format.

Parameters

nOuterCount	Specifies the outer count (e.g. batch * channels)
m	Specifies the
n
k
hA	Specifies the handle to GPU memory holding the mxk matrix A (in row-major format)
hB	Specifies the handle to GPU memory holding the kxn matrix B (in row-major format)
hC	Specifies the handle to GPU memory holding the mxn matrix C (in row-major format) where the result is placed.
dfScale	Specifies the scale value applied to matrix B in hB (default = 1.0)
bTransA	Specifies to transpose matrix A (default = false).
bTransB	Specifies to transpose matrix B (default = false).

See also: How to transpose a matrix in CUDA/cublas

Definition at line 6695 of file CudaDnn.cs.

◆ matrix_meancenter_by_column()

void MyCaffe.common.CudaDnn< T >.matrix_meancenter_by_column	(	int	nWidth,
		int	nHeight,
		long	hA,
		long	hB,
		long	hY,
		bool	bNormalize = `false`
	)

Mean center the data by columns, where each column is summed and then subtracted from each column value.

Parameters

nWidth	Number of columns in the matrix (dimension D)
nHeight	Number of rows in the matrix (dimension N)
hA	Input data matrix - N x D matrix (N rows, D columns)
hB	Column sums vector - D x 1 vector containing the sum of each column.
hY	Output data matrix - N x D matrix (N rows, D columns) containing mean centering of the input data matrix.
bNormalize	When true, each data item is divided by N to normalize each row item by column.

Definition at line 10725 of file CudaDnn.cs.

◆ max() [1/2]

void MyCaffe.common.CudaDnn< T >.max	(	int	n,
		long	hA,
		long	hB,
		long	hY
	)

Calculates the max of A and B and places the result in Y. This max is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the max of A(0) and B(0), etc.

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7669 of file CudaDnn.cs.

◆ max() [2/2]

double MyCaffe.common.CudaDnn< T >.max	(	int	n,
		long	hA,
		out long	lPos,
		int	nAOff = `0`,
		long	hWork = `0`
	)

Finds the maximum value of A.

This function uses NVIDIA's Thrust.

Parameters

n	Specifies the number of items (not bytes) in the vectors A.
hA	Specifies a handle to the vector A in GPU memory.
lPos	Returns the position of the maximum value.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
hWork	Optionally, specifies the handle to GPU memory in the size of A, which when specified is used in the extended version of max val. The extended version does not use thrust, and does not calculate 'lPos', which is always returned as -1 when using the extended version. (default = 0, use non extended version)

Returns: The maximum value is returned as type
double

Definition at line 7724 of file CudaDnn.cs.

◆ max_bwd() [1/2]

void MyCaffe.common.CudaDnn< T >.max_bwd	(	int	n,
		long	hAdata,
		long	hBdata,
		long	hYdiff,
		long	hAdiff,
		long	hBdiff
	)

Propagates the Y diff back to the max of A or B and places the result in A if its data has the max, or B if its data has the max.

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hAdata	Specifies a handle to the data vector A in GPU memory.
hBdata	Specifies a handle to the data vector B in GPU memory.
hYdiff	Specifies a handle to the diff vector Y in GPU memory.
hAdiff	Specifies a handle to the mutable diff vector A in GPU memory.
hBdiff	Specifies a handle to the mutable diff vector B in GPU memory.

Definition at line 7686 of file CudaDnn.cs.

◆ max_bwd() [2/2]

void MyCaffe.common.CudaDnn< T >.max_bwd	(	int	nCount,
		long	hTopDiff,
		int	nIdx,
		long	hMask,
		long	hBottomDiff
	)

Performs a max backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nIdx	Specifies the blob index used to test the mask.
hMask	Specifies a handle to the mask data in GPU.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9758 of file CudaDnn.cs.

◆ max_fwd()

void MyCaffe.common.CudaDnn< T >.max_fwd	(	int	nCount,
		long	hBottomDataA,
		long	hBottomDataB,
		int	nIdx,
		long	hTopData,
		long	hMask
	)

Performs a max forward pass in Cuda.

Calculation:

Parameters

nCount	Specifies the number of items.
hBottomDataA	Specifies a handle to the Bottom A data in GPU memory.
hBottomDataB	Specifies a handle to the Bottom B data in GPU memory.
nIdx	Specifies the blob index used to set the mask.
hTopData	Specifies a handle to the Top data in GPU memory.
hMask	Specifies a handle to the mask data in GPU.

Definition at line 9742 of file CudaDnn.cs.

◆ mean_error_loss_bwd()

void MyCaffe.common.CudaDnn< T >.mean_error_loss_bwd	(	int	nCount,
		long	hPredicted,
		long	hTarget,
		long	hBottomDiff,
		MEAN_ERROR	merr
	)

Performs a Mean Error Loss backward pass in Cuda.

The gradient is set to: +1 when predicted greater than target, -1 when predicted less than target, 0 when predicted equal to target. if propagate_down[1] == true.

See also: Mean Absolute Error (MAE) derivative

Parameters

nCount	Specifies the number of items.
hPredicted	Specifies a handle to the predicted data in GPU memory.
hTarget	Specifies a handle to the target data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
merr	Specifies the type of mean error to run.

Definition at line 8991 of file CudaDnn.cs.

◆ min() [1/2]

void MyCaffe.common.CudaDnn< T >.min	(	int	n,
		long	hA,
		long	hB,
		long	hY
	)

Calculates the min of A and B and places the result in Y. This min is only computed on a per item basis, so the shape of Y = the shape of A and B and Y(0) contains the min of A(0) and B(0), etc.

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7702 of file CudaDnn.cs.

◆ min() [2/2]

double MyCaffe.common.CudaDnn< T >.min	(	int	n,
		long	hA,
		out long	lPos,
		int	nAOff = `0`,
		long	hWork = `0`
	)

Finds the minimum value of A.

This function uses NVIDIA's Thrust.

Parameters

n	Specifies the number of items (not bytes) in the vectors A.
hA	Specifies a handle to the vector A in GPU memory.
lPos	Returns the position of the minimum value.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A (default = 0).
hWork	Optionally, specifies the handle to GPU memory in the size of A, which when specified is used in the extended version of max val. The extended version does not use thrust, and does not calculate 'lPos', which is always returned as -1 when using the extended version. (default = 0, use non extended version)

Returns: The minimum value is returned as type
double

Definition at line 7772 of file CudaDnn.cs.

◆ min_bwd()

void MyCaffe.common.CudaDnn< T >.min_bwd	(	int	nCount,
		long	hTopDiff,
		int	nIdx,
		long	hMask,
		long	hBottomDiff
	)

Performs a min backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nIdx	Specifies the blob index used to test the mask.
hMask	Specifies a handle to the mask data in GPU.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9794 of file CudaDnn.cs.

◆ min_fwd()

void MyCaffe.common.CudaDnn< T >.min_fwd	(	int	nCount,
		long	hBottomDataA,
		long	hBottomDataB,
		int	nIdx,
		long	hTopData,
		long	hMask
	)

Performs a min forward pass in Cuda.

Calculation:

Parameters

nCount	Specifies the number of items.
hBottomDataA	Specifies a handle to the Bottom A data in GPU memory.
hBottomDataB	Specifies a handle to the Bottom B data in GPU memory.
nIdx	Specifies the blob index used to set the mask.
hTopData	Specifies a handle to the Top data in GPU memory.
hMask	Specifies a handle to the mask data in GPU.

Definition at line 9778 of file CudaDnn.cs.

◆ minmax() [1/2]

Tuple< double, double, double, double > MyCaffe.common.CudaDnn< T >.minmax	(	int	n,
		long	hA,
		long	hWork1,
		long	hWork2,
		bool	bDetectNans = `false`,
		int	nAOff = `0`
	)

Finds the minimum and maximum values within A.

Parameters

n	Specifies the number of items (not bytes) in the vector A.
hA	Specifies a handle to the vector A in GPU memory.
hWork1	Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2	Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
bDetectNans	Optionally, specifies whether or not to detect Nans.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A.

Returns: A four element tuple is returned where the first item contains the minimum, the second item contains the maximum, the third contains the number of NaN values and the fourth contains the number of Infinity values.
When calling this function with
hA = 0

the function instead returns the required size of hWork1, hWork2, 0, 0 (in items, not bytes).

Definition at line 7818 of file CudaDnn.cs.

◆ minmax() [2/2]

void MyCaffe.common.CudaDnn< T >.minmax	(	int	n,
		long	hA,
		long	hWork1,
		long	hWork2,
		int	nK,
		long	hMin,
		long	hMax,
		bool	bNonZeroOnly
	)

Finds up to 'nK' minimum and maximum values within A.

Parameters

n	Specifies the number of items (not bytes) in the vector A.
hA	Specifies a handle to the vector A in GPU memory.
hWork1	Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
hWork2	Specifies a handle to workspace data in GPU memory. To get the size of the workspace memory, call this function with hA = 0.
nK	Specifies the number of min and max values to find.
hMin	Specifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
hMax	Specifies a handle to host memory allocated with AllocHostBuffer in the length 'nK' where the min values are placed.
bNonZeroOnly	Specifies whether or not to exclude zero from the min and max calculations.

Definition at line 7843 of file CudaDnn.cs.

◆ mish_bwd()

void MyCaffe.common.CudaDnn< T >.mish_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData,
		double	dfThreshold,
		int	nMethod = `0`
	)

Performs a Mish backward pass in Cuda.

Computes the mish gradient $f(x)' = \frac{ exp(x) * (4*e^x * x + 4*x + 6*e^x + 4*e^2x + e^3x + 4) }{ (2*e^x + e^2x + 2)^2 }$ Note, see Wolfram Alpha with 'derivative of x * tanh(ln(1 + e^x))'

See also: Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.
dfThreshold	Specifies the threshold value.
nMethod	Optionally, specifies to run the new implementation when > 0.

Definition at line 9035 of file CudaDnn.cs.

◆ mish_fwd()

void MyCaffe.common.CudaDnn< T >.mish_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		double	dfThreshold
	)

Performs a Mish forward pass in Cuda.

Computes the mish non-linearity .

See also: Mish: A Self Regularized Non-Monotonic Neural Activation Function by Diganta Misra, 2019.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
dfThreshold	Specifies the threshold value.

Definition at line 9011 of file CudaDnn.cs.

◆ mul()

void MyCaffe.common.CudaDnn< T >.mul	(	int	n,
		long	hA,
		long	hB,
		long	hY,
		int	nAOff = `0`,
		int	nBOff = `0`,
		int	nYOff = `0`
	)

Multiplies each element of A with each element of B and places the result in Y.

Y = A * B (element by element)

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A.
nBOff	Optionally, specifies an offset (in items, not bytes) into the memory of B.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7334 of file CudaDnn.cs.

◆ mul_scalar() [1/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar	(	int	n,
		double	fAlpha,
		long	hY
	)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters

n	Specifies the number of items (not bytes) in the vectors Y.
fAlpha	Specifies the scalar in type double
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7374 of file CudaDnn.cs.

◆ mul_scalar() [2/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar	(	int	n,
		float	fAlpha,
		long	hY
	)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters

n	Specifies the number of items (not bytes) in the vectors Y.
fAlpha	Specifies the scalar in type float
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7388 of file CudaDnn.cs.

◆ mul_scalar() [3/3]

void MyCaffe.common.CudaDnn< T >.mul_scalar	(	int	n,
		T	fAlpha,
		long	hY
	)

Mutlipy each element of Y by a scalar.

Y = Y * alpha

Parameters

n	Specifies the number of items (not bytes) in the vectors Y.
fAlpha	Specifies the scalar in type 'T'.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7402 of file CudaDnn.cs.

◆ mulbsx()

void MyCaffe.common.CudaDnn< T >.mulbsx	(	int	n,
		long	hA,
		int	nAOff,
		long	hX,
		int	nXOff,
		int	nC,
		int	nSpatialDim,
		bool	bTranspose,
		long	hB,
		int	nBOff
	)

Multiply a matrix with a vector.

Parameters

n	Specifies the number of items.
hA	Specifies the matrix to multiply.
nAOff	Specifies the offset to apply to the GPU memory of hA.
hX	Specifies the vector to multiply.
nXOff	Specifies the offset to apply to the GPU memory of hX.
nC	Specifies the number of channels.
nSpatialDim	Specifies the spatial dimension.
bTranspose	Specifies whether or not to transpose the matrix.
hB	Specifies the output matrix.
nBOff	Specifies the offset to apply to the GPU memory of hB.

Definition at line 6650 of file CudaDnn.cs.

◆ NcclAllReduce()

void MyCaffe.common.CudaDnn< T >.NcclAllReduce	(	long	hNccl,
		long	hStream,
		long	hX,
		int	nCount,
		NCCL_REDUCTION_OP	op,
		double	dfScale = `1.0`
	)

Performs a reduction on all NCCL instances as specified by the reduction operation.

See Fast Multi-GPU collectives with NCCL.

Parameters

hNccl	Specifies a handle to an NCCL instance.
hStream	Specifies a handle to the stream to use for synchronization.
hX	Specifies a handle to the GPU data to reduce with the other instances of NCCL.
nCount	Specifies the number of items (not bytes) in the data.
op	Specifies the reduction operation to perform.
dfScale	Optionally, specifies a scaling to be applied to the final reduction.

Definition at line 3442 of file CudaDnn.cs.

◆ NcclBroadcast()

void MyCaffe.common.CudaDnn< T >.NcclBroadcast	(	long	hNccl,
		long	hStream,
		long	hX,
		int	nCount
	)

Broadcasts a block of GPU data to all NCCL instances.

See Fast Multi-GPU collectives with NCCL.

Parameters

hNccl	Specifies a handle to an NCCL instance.
hStream	Specifies a handle to the stream to use for synchronization.
hX	Specifies a handle to the GPU data to be broadcasted (or recieved).
nCount	Specifies the number of items (not bytes) in the data.

Definition at line 3421 of file CudaDnn.cs.

◆ NcclInitializeMultiProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeMultiProcess ( long hNccl )

Initializes a set of NCCL instances for use in different processes.

See Fast Multi-GPU collectives with NCCL.

Parameters

hNccl Specifies the handle of NCCL to initialize.

Definition at line 3403 of file CudaDnn.cs.

◆ NcclInitializeSingleProcess()

void MyCaffe.common.CudaDnn< T >.NcclInitializeSingleProcess ( params long[] rghNccl )

Initializes a set of NCCL instances for use in a single process.

See Fast Multi-GPU collectives with NCCL.

Parameters

rghNccl Specifies the array of NCCL handles that will be working together.

Definition at line 3370 of file CudaDnn.cs.

◆ nesterov_update()

void MyCaffe.common.CudaDnn< T >.nesterov_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hHistoryData,
		T	fMomentum,
		T	fLocalRate
	)

Perform the Nesterov update

See Lecture 6c The momentum method by Hinton, et al., 2012, and Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent by Botev, et al., 2016

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hHistoryData	Specifies a handle to the history data in GPU memory.
fMomentum	Specifies the momentum value.
fLocalRate	Specifies the local learning rate.

Definition at line 10223 of file CudaDnn.cs.

◆ nllloss_bwd()

void MyCaffe.common.CudaDnn< T >.nllloss_bwd	(	int	nCount,
		long	hTopData,
		long	hLabel,
		long	hBottomDiff,
		int	nOuterNum,
		int	nDim,
		int	nInnerNum,
		long	hCounts,
		int?	nIgnoreLabel
	)

Performs NLL Loss backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopData	Specifies a handle to the top data in GPU memory.
hLabel	Specifies a handle to the label data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
nOuterNum	*NEEDS REVIEW*
nDim	*NEEDS REVIEW*
nInnerNum	*NEEDS REVIEW*
hCounts	Specifies a handle to the counts in GPU memory.
nIgnoreLabel	Optionally, specifies a label to ignore.

Definition at line 9707 of file CudaDnn.cs.

◆ nllloss_fwd()

void MyCaffe.common.CudaDnn< T >.nllloss_fwd	(	int	nCount,
		long	hProbData,
		long	hLabel,
		long	hLossData,
		int	nOuterNum,
		int	nDim,
		int	nInnerNum,
		long	hCounts,
		int?	nIgnoreLabel
	)

Performs NLL Loss forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hProbData	Specifies a handle to the probability data in GPU memory.
hLabel	Specifies a handle to the label data in GPU memory.
hLossData	Specifies a handle to the loss data in GPU memory.
nOuterNum	*NEEDS REVIEW*
nDim	*NEEDS REVIEW*
nInnerNum	*NEEDS REVIEW*
hCounts	Specifies a handle to the counts in GPU memory.
nIgnoreLabel	Optionally, specifies a label to ignore.

Definition at line 9673 of file CudaDnn.cs.

◆ permute()

void MyCaffe.common.CudaDnn< T >.permute	(	int	nCount,
		long	hBottom,
		bool	bFwd,
		long	hPermuteOrder,
		long	hOldSteps,
		long	hNewSteps,
		int	nNumAxes,
		long	hTop
	)

Performs data permutation on the input and reorders the data which is placed in the output.

Parameters

nCount	Specifies the number of items.
hBottom	Specifies the input data.
bFwd	Specifies whether or not this is a forward (true) or backwards (true) operation.
hPermuteOrder	Specifies the permuation order values in GPU memory.
hOldSteps	Specifies the old step values in GPU memory.
hNewSteps	Specifies the new step values in GPU memory.
nNumAxes	Specifies the number of axes.
hTop	Specifies the output data.

Definition at line 10082 of file CudaDnn.cs.

◆ pooling_bwd()

void MyCaffe.common.CudaDnn< T >.pooling_bwd	(	POOLING_METHOD	method,
		int	nCount,
		long	hTopDiff,
		int	num,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nPooledHeight,
		int	nPooledWidth,
		int	nKernelH,
		int	nKernelW,
		int	nStrideH,
		int	nStrideW,
		int	nPadH,
		int	nPadW,
		long	hBottomDiff,
		long	hMask,
		long	hTopMask
	)

Performs the backward pass for pooling using Cuda

Parameters

method	Specifies the pooling method.
nCount	Specifies the number of items in the bottom data.
hTopDiff	Specifies a handle to the top diff in GPU memory.
num	Specifies the number of inputs.
nChannels	Specifies the number of channels per input.
nHeight	Specifies the height of each input.
nWidth	Specifies the width of each input.
nPooledHeight	Specifies the height of the pooled data.
nPooledWidth	Specifies the width of the pooled data.
nKernelH	Specifies the height of the pooling kernel.
nKernelW	Specifies the width of the pooling kernel.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.
hTopMask	Specifies a handle to the top mask data in GPU memory.

Definition at line 8839 of file CudaDnn.cs.

◆ pooling_fwd()

void MyCaffe.common.CudaDnn< T >.pooling_fwd	(	POOLING_METHOD	method,
		int	nCount,
		long	hBottomData,
		int	num,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nPooledHeight,
		int	nPooledWidth,
		int	nKernelH,
		int	nKernelW,
		int	nStrideH,
		int	nStrideW,
		int	nPadH,
		int	nPadW,
		long	hTopData,
		long	hMask,
		long	hTopMask
	)

Performs the forward pass for pooling using Cuda

Parameters

method	Specifies the pooling method.
nCount	Specifies the number of items in the bottom data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
num	Specifies the number of inputs.
nChannels	Specifies the number of channels per input.
nHeight	Specifies the height of each input.
nWidth	Specifies the width of each input.
nPooledHeight	Specifies the height of the pooled data.
nPooledWidth	Specifies the width of the pooled data.
nKernelH	Specifies the height of the pooling kernel.
nKernelW	Specifies the width of the pooling kernel.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
hTopData	Specifies a handle to the top data in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.
hTopMask	Specifies a handle to the top mask data in GPU memory.

Definition at line 8810 of file CudaDnn.cs.

◆ PoolingBackward()

void MyCaffe.common.CudaDnn< T >.PoolingBackward	(	long	hCuDnn,
		long	hPoolingDesc,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a pooling backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hPoolingDesc	Specifies a handle to the pooling descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4116 of file CudaDnn.cs.

◆ PoolingForward()

void MyCaffe.common.CudaDnn< T >.PoolingForward	(	long	hCuDnn,
		long	hPoolingDesc,
		T	fAlpha,
		long	hBottomDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDesc,
		long	hTopData
	)

Perform a pooling forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hPoolingDesc	Specifies a handle to the pooling descriptor.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDesc	Specifies a handle to the bottom tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDesc	Specifies a handle to the top tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4093 of file CudaDnn.cs.

◆ powx() [1/3]

void MyCaffe.common.CudaDnn< T >.powx	(	int	n,
		long	hA,
		double	fAlpha,
		long	hY,
		int	nAOff = `0`,
		int	nYOff = `0`
	)

Calculates the A raised to the power alpha and places the result in Y.

$f(x) = x^\alpha$

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
fAlpha	Specifies the scalar in type double
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies the offset for hA memory (default = 0).
nYOff	Optionally, specifies the offset for hY memory (default = 0).

Definition at line 7524 of file CudaDnn.cs.

◆ powx() [2/3]

void MyCaffe.common.CudaDnn< T >.powx	(	int	n,
		long	hA,
		float	fAlpha,
		long	hY,
		int	nAOff = `0`,
		int	nYOff = `0`
	)

Calculates the A raised to the power alpha and places the result in Y.

$f(x) = x^\alpha$

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
fAlpha	Specifies the scalar in type float
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies the offset for hA memory (default = 0).
nYOff	Optionally, specifies the offset for hY memory (default = 0).

Definition at line 7541 of file CudaDnn.cs.

◆ powx() [3/3]

void MyCaffe.common.CudaDnn< T >.powx	(	int	n,
		long	hA,
		T	fAlpha,
		long	hY,
		int	nAOff = `0`,
		int	nYOff = `0`
	)

Calculates the A raised to the power alpha and places the result in Y.

$f(x) = x^\alpha$

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hA	Specifies a handle to the vector A in GPU memory.
fAlpha	Specifies the scalar in type 'T'.
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies the offset for hA memory (default = 0).
nYOff	Optionally, specifies the offset for hY memory (default = 0).

Definition at line 7558 of file CudaDnn.cs.

◆ prelu_bwd()

void MyCaffe.common.CudaDnn< T >.prelu_bwd	(	int	nCount,
		int	nChannels,
		int	nDim,
		long	hTopDiff,
		long	hBottomData,
		long	hBottomDiff,
		long	hSlopeData,
		int	nDivFactor
	)

Performs Parameterized Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also: Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015

Parameters

nCount	Specifies the number of items.
nChannels	Specifies the channels per input.
nDim	Specifies the dimension of each input.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hSlopeData	Specifies a handle to the slope data in GPU memory.
nDivFactor	Specifies the div factor applied to the channels.

Definition at line 9585 of file CudaDnn.cs.

◆ prelu_bwd_param()

void MyCaffe.common.CudaDnn< T >.prelu_bwd_param	(	int	nCDim,
		int	nNum,
		int	nTopOffset,
		long	hTopDiff,
		long	hBottomData,
		long	hBackBuffDiff
	)

Performs Parameterized Rectifier Linear Unit (ReLU) backward param pass in Cuda.

See also: Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015

Parameters

nCDim	*NEEDS REVIEW*
nNum	*NEEDS REVIEW*
nTopOffset	*NEEDS REVIEW*
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hBackBuffDiff	Specifies a handle to the back buffer diff in GPU memory.

Definition at line 9562 of file CudaDnn.cs.

◆ prelu_fwd()

void MyCaffe.common.CudaDnn< T >.prelu_fwd	(	int	nCount,
		int	nChannels,
		int	nDim,
		long	hBottomData,
		long	hTopData,
		long	hSlopeData,
		int	nDivFactor
	)

Performs Parameterized Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation

See also: Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015

Parameters

nCount	Specifies the number of items.
nChannels	Specifies the channels per input.
nDim	Specifies the dimension of each input.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hSlopeData	Specifies a handle to the slope data in GPU memory.
nDivFactor	Specifies the div factor applied to the channels.

Definition at line 9540 of file CudaDnn.cs.

◆ relu_bwd()

void MyCaffe.common.CudaDnn< T >.relu_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		T	fNegativeSlope
	)

Performs a Rectifier Linear Unit (ReLU) backward pass in Cuda.

See also: Rectifier, and; Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
fNegativeSlope	Specifies the negative slope.

Definition at line 9404 of file CudaDnn.cs.

◆ relu_fwd()

void MyCaffe.common.CudaDnn< T >.relu_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		T	fNegativeSlope
	)

Performs a Rectifier Linear Unit (ReLU) forward pass in Cuda.

Calculation

See also: Rectifier, and; Understanding Deep Neural Networks with Rectified Linear Units by Arora, et al., 2016,; Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
fNegativeSlope	Specifies the negative slope.

Definition at line 9383 of file CudaDnn.cs.

◆ ReLUBackward()

void MyCaffe.common.CudaDnn< T >.ReLUBackward	(	long	hCuDnn,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a ReLU backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4598 of file CudaDnn.cs.

◆ ReLUForward()

void MyCaffe.common.CudaDnn< T >.ReLUForward	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Perform a ReLU forward pass.

See Rectifier Nonlinearities Improve Neural Network Acoustic Models by Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013), In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4576 of file CudaDnn.cs.

◆ ReportMemory()

void MyCaffe.common.CudaDnn< T >.ReportMemory	(	Log	log,
		string	strLocation
	)

Report the memory use on the current GPU managed by the CudaDnn object.

Parameters

log	Specifies the output log.
strLocation	Specifies the location of the memory test.

Definition at line 11236 of file CudaDnn.cs.

◆ ResetDevice()

void MyCaffe.common.CudaDnn< T >.ResetDevice ( )

Reset the current device.

IMPORTANT: This function will delete all memory and state information on the current device, which may cause other CudaDnn instances using the same device, to fail. For that reason, it is recommended to only call this function when testing.

Definition at line 2079 of file CudaDnn.cs.

◆ ResetGhostMemory()

void MyCaffe.common.CudaDnn< T >.ResetGhostMemory ( )

Resets the ghost memory by enabling it if this instance was configured to use ghost memory.

Definition at line 1783 of file CudaDnn.cs.

◆ rmsprop_update()

void MyCaffe.common.CudaDnn< T >.rmsprop_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hHistoryData,
		T	fRmsDecay,
		T	fDelta,
		T	fLocalRate
	)

Perform the RMSProp update

See Lecture 6e rmsprop: Divide the gradient by a running average of its recent magnitude by Tieleman and Hinton, 2012, and RMSProp and equilibrated adaptive learning rates for non-convex optimization by Dauphin, et al., 2015

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hHistoryData	Specifies a handle to the history data in GPU memory.
fRmsDecay	Specifies the decay value used by the Solver. MeanSquare(t) = 'rms_decay' * MeanSquare(t-1) + (1 - 'rms_decay') * SquareGradient(t).
fDelta	Specifies the numerical stability factor.
fLocalRate	Specifies the local learning rate.

Definition at line 10334 of file CudaDnn.cs.

◆ rng_bernoulli() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli	(	int	n,
		double	fNonZeroProb,
		long	hY
	)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fNonZeroProb	Specifies the probability that a given value is set to non zero.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8631 of file CudaDnn.cs.

◆ rng_bernoulli() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli	(	int	n,
		float	fNonZeroProb,
		long	hY
	)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fNonZeroProb	Specifies the probability that a given value is set to non zero.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8645 of file CudaDnn.cs.

◆ rng_bernoulli() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_bernoulli	(	int	n,
		T	fNonZeroProb,
		long	hY
	)

Fill Y with random numbers using a bernoulli random distribution.

See Bernoulli Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fNonZeroProb	Specifies the probability that a given value is set to non zero.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8659 of file CudaDnn.cs.

◆ rng_gaussian() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian	(	int	n,
		double	fMu,
		double	fSigma,
		long	hY
	)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMu	Specifies the mean of the distribution with a type of double
fSigma	Specifies the standard deviation of the distribution with a type of double
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8578 of file CudaDnn.cs.

◆ rng_gaussian() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian	(	int	n,
		float	fMu,
		float	fSigma,
		long	hY
	)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMu	Specifies the mean of the distribution with a type of float
fSigma	Specifies the standard deviation of the distribution with a type of float
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8593 of file CudaDnn.cs.

◆ rng_gaussian() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_gaussian	(	int	n,
		T	fMu,
		T	fSigma,
		long	hY
	)

Fill Y with random numbers using a gaussian random distribution.

This function uses NVIDIA's cuRand. See also Guassian Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMu	Specifies the mean of the distribution with a type of 'T'.
fSigma	Specifies the standard deviation of the distribution with a type of 'T'.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8608 of file CudaDnn.cs.

◆ rng_setseed()

void MyCaffe.common.CudaDnn< T >.rng_setseed ( long lSeed )

Sets the random number generator seed used by random number operations.

This function uses NVIDIA's cuRand

Parameters

lSeed Specifies the random number generator seed.

Definition at line 8506 of file CudaDnn.cs.

◆ rng_uniform() [1/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform	(	int	n,
		double	fMin,
		double	fMax,
		long	hY
	)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMin	Specifies the minimum value of the distribution with a type of double
fMax	Specifies the maximum value of the distribution with a type of double
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8524 of file CudaDnn.cs.

◆ rng_uniform() [2/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform	(	int	n,
		float	fMin,
		float	fMax,
		long	hY
	)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMin	Specifies the minimum value of the distribution with a type of float
fMax	Specifies the maximum value of the distribution with a type of float
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8539 of file CudaDnn.cs.

◆ rng_uniform() [3/3]

void MyCaffe.common.CudaDnn< T >.rng_uniform	(	int	n,
		T	fMin,
		T	fMax,
		long	hY
	)

Fill Y with random numbers using a uniform random distribution.

This function uses NVIDIA's cuRand. See also Uniform Distribution.

Parameters

n	Specifies the number of items (not bytes) in the vector X.
fMin	Specifies the minimum value of the distribution with a type of 'T'.
fMax	Specifies the maximum value of the distribution with a type of 'T'.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8554 of file CudaDnn.cs.

◆ Rnn8Backward()

void MyCaffe.common.CudaDnn< T >.Rnn8Backward	(	long	hCuDnn,
		long	hRnn,
		long	hY,
		long	hdY,
		long	hX,
		long	hdX,
		long	hhX,
		long	hdhY,
		long	hdhX,
		long	hcX,
		long	hdcY,
		long	hdcX,
		long	hWt,
		long	hdWt,
		long	hWork,
		long	hReserved
	)

Calculate the backward pass through the RNN8 for both data and weights.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnn	Specifies the handle to the RNN8 created with CreateRnn8.
hY	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) containing the outputs from the forward.
hdY	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) containing the inbound gradients for Y.
hX	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Inputs) containing the inputs.
hdX	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) where the outbound, calculated gradients for X are placed.
hhX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden inputs.
hdhY	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the inbound gradients for hidden.
hdhX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) where the outbound, calculated gradients for hidden are placed.
hcX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden cell inputs.
hdcY	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the inbound sgradients for the cell hidden.
hdcX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) where the outbound, calculated gradients for cell hidden are placed.
hWt	Specifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, containing the weights.
hdWt	Specifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, where the weight gradients are placed.
hWork	Specifies a handle to the GPU memory of size szWork calculated with GetRnn8MemorySizes, used as temporary work data.
hReserved	Specifies a handle to the GPU memory of size szReserved calculated with GetRnn8MemorySizes, used as temporary reserve data.

Definition at line 5300 of file CudaDnn.cs.

◆ Rnn8Forward()

void MyCaffe.common.CudaDnn< T >.Rnn8Forward	(	long	hCuDnn,
		long	hRnn,
		long	hX,
		long	hY,
		long	hhX,
		long	hhY,
		long	hcX,
		long	hcY,
		long	hWts,
		long	hWork,
		long	hReserved
	)

Calculate the forward pass through the RNN8.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnn	Specifies the handle to the RNN8 created with CreateRnn8.
hX	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Inputs) containing the inputs.
hY	Specifies a handle to the GPU memory of shape (SeqLen, BatchSize, Outputs) where the outputs are placed.
hhX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden inputs.
hhY	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) where the hidden outputs are placed.
hcX	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) containing the hidden cell inputs.
hcY	Specifies a handle to the GPU memory of shape (BatchSize, Hidden) where the hidden cell outputs are placed.
hWts	Specifies a handle to the GPU memory of size szWt calculated with GetRnn8MemorySizes, containing the weights.
hWork	Specifies a handle to the GPU memory of size szWork calculated with GetRnn8MemorySizes, used as temporary work data.
hReserved	Specifies a handle to the GPU memory of size szReserved calculated with GetRnn8MemorySizes, used as temporary reserve data.

Definition at line 5273 of file CudaDnn.cs.

◆ RnnBackwardData()

void MyCaffe.common.CudaDnn< T >.RnnBackwardData	(	long	hCuDnn,
		long	hRnnDesc,
		long	hYDesc,
		long	hYData,
		long	hYDiff,
		long	hHyDesc,
		long	hHyDiff,
		long	hCyDesc,
		long	hCyDiff,
		long	hWtDesc,
		long	hWtData,
		long	hHxDesc,
		long	hHxData,
		long	hCxDesc,
		long	hCxData,
		long	hXDesc,
		long	hXDiff,
		long	hdHxDesc,
		long	hHxDiff,
		long	hdCxDesc,
		long	hCxDiff,
		long	hWorkspace,
		ulong	nWsCount,
		long	hReserved,
		ulong	nResCount
	)

Run the RNN backward pass through the data.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
hYDesc	Specifies a handle to the output data descriptor.
hYData	Specifies a handle to the output GPU data.
hYDiff	Specifies a handle to the output GPU gradients.
hHyDesc	Specifies a handle to the output hidden descriptor.
hHyDiff	Specifies a handle to the output hidden gradients.
hCyDesc	Specifies a handle to the output cont descriptor.
hCyDiff	Specifies a handle to the output cont gradients.
hWtDesc	Specifies a handle to the weight descriptor.
hWtData	Specifies a handle to the weight data.
hHxDesc	Specifies a handle to the hidden data descriptor.
hHxData	Specifies a handle to the hidden GPU data.
hCxDesc	Specifies a handle to the cont data descriptor.
hCxData	Specifies a handle to the cont GPU data.
hXDesc	Specifies a handle to the input data descriptor.
hXDiff	Specifies a handle to the input GPU gradients.
hdHxDesc	Specifies a handle to the input hidden descriptor for the gradients.
hHxDiff	Specifis a handle to the input hidden GPU gradients.
hdCxDesc	Specifies a handle to the input cont descriptor of the gradients.
hCxDiff	Specifies a handle to the input cont GPU gradients.
hWorkspace	Specifies a handle to the workspace GPU memory.
nWsCount	Specifies the number of items within the workspace.
hReserved	Specifies a handle to the reserved GPU memory.
nResCount	Specifies the number of items within the reserved memory.

Definition at line 4981 of file CudaDnn.cs.

◆ RnnBackwardWeights()

void MyCaffe.common.CudaDnn< T >.RnnBackwardWeights	(	long	hCuDnn,
		long	hRnnDesc,
		long	hXDesc,
		long	hXData,
		long	hHxDesc,
		long	hHxData,
		long	hYDesc,
		long	hYData,
		long	hWorkspace,
		ulong	nWsCount,
		long	hWtDesc,
		long	hWtDiff,
		long	hReserved,
		ulong	nResCount
	)

Run the RNN backward pass on the weights.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
hXDesc	Specifies a handle to the input data descriptor.
hXData	Specifies a handle to the input GPU data.
hHxDesc	Specifies a handle to the hidden data descriptor.
hHxData	Specifies a handle to the hidden GPU data.
hYDesc	Specifies a handle to the output data descriptor.
hYData	Specifies a handle to the output GPU data.
hWorkspace	Specifies a handle to the workspace GPU memory.
nWsCount	Specifies the number of items within the workspace.
hWtDesc	Specifies a handle to the weight descriptor.
hWtDiff	Specifies a handle to the weight gradients.
hReserved	Specifies a handle to the reserved GPU memory.
nResCount	Specifies the number of items within the reserved memory.

Definition at line 5080 of file CudaDnn.cs.

◆ RnnForward()

void MyCaffe.common.CudaDnn< T >.RnnForward	(	long	hCuDnn,
		long	hRnnDesc,
		long	hXDesc,
		long	hXData,
		long	hHxDesc,
		long	hHxData,
		long	hCxDesc,
		long	hCxData,
		long	hWtDesc,
		long	hWtData,
		long	hYDesc,
		long	hYData,
		long	hHyDesc,
		long	hHyData,
		long	hCyDesc,
		long	hCyData,
		long	hWorkspace,
		ulong	nWsCount,
		long	hReserved,
		ulong	nResCount,
		bool	bTraining
	)

Run the RNN through a forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
hXDesc	Specifies a handle to the input data descriptor.
hXData	Specifies a handle to the input GPU data.
hHxDesc	Specifies a handle to the hidden data descriptor.
hHxData	Specifies a handle to the hidden GPU data.
hCxDesc	Specifies a handle to the cont data descriptor.
hCxData	Specifies a handle to the cont GPU data.
hWtDesc	Specifies a handle to the weight descriptor.
hWtData	Specifies a handle to the weight data.
hYDesc	Specifies a handle to the output data descriptor.
hYData	Specifies a handle to the output GPU data.
hHyDesc	Specifies a handle to the output hidden descriptor.
hHyData	Specifies a handle to the output hidden data.
hCyDesc	Specifies a handle to the output cont descriptor.
hCyData	Specifies a handle to the output cont data.
hWorkspace	Specifies a handle to the workspace GPU memory.
nWsCount	Specifies the number of items within the workspace.
hReserved	Specifies a handle to the reserved GPU memory.
nResCount	Specifies the number of items within the reserved memory.
bTraining	Specifies the whether the forward pass is during taining or not.

Definition at line 4881 of file CudaDnn.cs.

◆ RunExtension()

T[] MyCaffe.common.CudaDnn< T >.RunExtension	(	long	hExtension,
		long	lfnIdx,
		T[]	rgParam
	)

Run a function on the extension specified.

Parameters

hExtension	Specifies the handle to the extension created with CreateExtension.
lfnIdx	Specifies the extension function to run.
rgParam	Specifies the parameters to pass to the extension.

Returns: The values returned by the extension are returned.

Definition at line 3489 of file CudaDnn.cs.

◆ RunMemoryTest()

T[] MyCaffe.common.CudaDnn< T >.RunMemoryTest	(	long	h,
		MEMTEST_TYPE	type,
		ulong	ulBlockStartOffset,
		ulong	ulBlockCount,
		bool	bVerbose,
		bool	bWrite,
		bool	bReadWrite,
		bool	bRead
	)

The RunMemoryTest method runs the memory test from the block start offset through the block count on the memory previously allocated using CreateMemoryTest.

Parameters

h	Specifies the handle to the memory test data.
type	Specifies the type of memory test to run.
ulBlockStartOffset	Specifies the block start offset (offset into the total blocks returned by CreateMemoryTest).
ulBlockCount	Specifies the number of blocks to test.
bVerbose	When disabled, the memory test is just run once and the number of errors is returned. When eanbled, the memory test is run twice and the erroring adresses are returned along with the error count.

Returns

The format of the array returned is as follows: rg[0] - specifies the starting memory address used for this memory test run. rg[1] - specifies the number of addresses over which the test was run (specified in 1 byte increments). rg[2] - specifies the number of errors found. rg[3, ...] - specifies the erroring addresses (specified in 1-bit increments)

Parameters

bWrite	Specifies to perform a write test.
bReadWrite	Specifies to perform a read/write test.
bRead	Specifies to peroform a read test.

Definition at line 3123 of file CudaDnn.cs.

◆ RunPCA()

bool MyCaffe.common.CudaDnn< T >.RunPCA	(	long	hPCA,
		int	nSteps,
		out int	nCurrentK,
		out int	nCurrentIteration
	)

Runs a number of steps of the iterative PCA algorithm.

See Parallel GPU Implementation of Iterative PCA Algorithms by Mircea Andrecut

Parameters

hPCA	Specifies a handle to the PCA instance to use.
nSteps	Specifies the number of steps to run.
nCurrentK	Returns the current component value.
nCurrentIteration	Returns the current iteration.

Returns: true

is returned when the maximum number of iterations have been run as specified in CreatePCA.

Definition at line 5417 of file CudaDnn.cs.

◆ scal() [1/3]

void MyCaffe.common.CudaDnn< T >.scal	(	int	n,
		double	fAlpha,
		long	hX,
		int	nXOff = `0`
	)

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scaling factor to apply to vector X, where the scaling factor is of type double
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Specifies an offset (in items, not bytes) into the memory of X.

Definition at line 6767 of file CudaDnn.cs.

◆ scal() [2/3]

void MyCaffe.common.CudaDnn< T >.scal	(	int	n,
		float	fAlpha,
		long	hX,
		int	nXOff = `0`
	)

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scaling factor to apply to vector X, where the scaling factor is of type float
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Specifies an offset (in items, not bytes) into the memory of X.

Definition at line 6782 of file CudaDnn.cs.

◆ scal() [3/3]

void MyCaffe.common.CudaDnn< T >.scal	(	int	n,
		T	fAlpha,
		long	hX,
		int	nXOff = `0`
	)

Scales the data in X by a scaling factor.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scaling factor to apply to vector X, where the scaling factor is of type 'T'.
hX	Specifies a handle to the vector X in GPU memory.
nXOff	Specifies an offset (in items, not bytes) into the memory of X.

Definition at line 6797 of file CudaDnn.cs.

◆ scale() [1/3]

void MyCaffe.common.CudaDnn< T >.scale	(	int	n,
		double	fAlpha,
		long	hX,
		long	hY
	)

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scale value in type double
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6925 of file CudaDnn.cs.

◆ scale() [2/3]

void MyCaffe.common.CudaDnn< T >.scale	(	int	n,
		float	fAlpha,
		long	hX,
		long	hY
	)

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scale value in type float
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 6940 of file CudaDnn.cs.

◆ scale() [3/3]

void MyCaffe.common.CudaDnn< T >.scale	(	int	n,
		T	fAlpha,
		long	hX,
		long	hY,
		int	nXOff = `0`,
		int	nYOff = `0`
	)

Scales the values in X and places them in Y.

This function uses NVIDIA's cuBlas.

Parameters

n	Specifies the number of items (not bytes) in the vector X and Y.
fAlpha	Specifies the scale value in type 'T'.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nXOff	Optionally, specifies an offset (in items, not bytes) into the memory of X.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 6957 of file CudaDnn.cs.

◆ scale_fwd()

void MyCaffe.common.CudaDnn< T >.scale_fwd	(	int	nCount,
		long	hX,
		long	hScaleData,
		int	nScaleDim,
		int	nInnerDim,
		long	hY,
		long	hBiasData = `0`
	)

Performs a scale forward pass in Cuda.

Calculation: $f(x) = \begin{cases} x * scaleData[(i / nInnerDim) \mod nScaleDim], & \text{if } hBias == 0\\ x * scaleData[(i / nInnerDim) \mod nScaleDim] + biasData[(i / nInnerDim) \mod nScaleDim] & \text{otherwise} \end{cases}$

Parameters

nCount	Specifies the number of items.
hX	Specifies the input data X in GPU memory.
hScaleData
nScaleDim
nInnerDim
hY	Specifies the output data Y in GPU memory.
hBiasData	Optionally, specifies the bias data in GPU memory.

Definition at line 9983 of file CudaDnn.cs.

◆ scale_to_range()

void MyCaffe.common.CudaDnn< T >.scale_to_range	(	int	n,
		long	hX,
		long	hY,
		double	fMin,
		double	fMax
	)

Scales the values in X and places the result in Y (can also run inline where X = Y).

Parameters

n	Specifies the number of items (not bytes) in the vector X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
fMin	Specifies the minimum of the new range.
fMax	Specifies the maximum of the new range.

Definition at line 6973 of file CudaDnn.cs.

◆ serf_bwd()

void MyCaffe.common.CudaDnn< T >.serf_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData,
		double	dfThreshold
	)

Performs a Serf backward pass in Cuda.

Computes the serf gradient $f(x)' = \text{erf}\left(\log \left(e^x+1\right)\right)+\frac{2 x e^{x-\log^2\left(e^x+1\right)}}{\sqrt{\pi } \left(e^x+1\right)}$

See also: Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function by Sayan Nag and Mayukh Bhattacharyya, 2021.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.
dfThreshold	Specifies the threshold value.

Definition at line 9267 of file CudaDnn.cs.

◆ serf_fwd()

void MyCaffe.common.CudaDnn< T >.serf_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData,
		double	dfThreshold
	)

Performs a Serf forward pass in Cuda.

Computes the serf non-linearity $f(x) = x erf(\ln( 1 + \exp(x) ))$ .

See also: Serf: Towards better training of deep neural networks using log-Softplus ERror activation Function by Sayan Nag and Mayukh Bhattacharyya, 2021.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
dfThreshold	Specifies the threshold value.

Definition at line 9245 of file CudaDnn.cs.

◆ set() [1/3]

void MyCaffe.common.CudaDnn< T >.set	(	int	nCount,
		long	hHandle,
		double	fVal,
		int	nIdx = `-1`
	)

Set the values of GPU memory to a specified value of type

double.

Parameters

nCount	Specifies the number of items to set.
hHandle	Specifies a handle to the memory on the GPU.
fVal	Specifies the value to set.
nIdx	When -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.

Definition at line 5897 of file CudaDnn.cs.

◆ set() [2/3]

void MyCaffe.common.CudaDnn< T >.set	(	int	nCount,
		long	hHandle,
		float	fVal,
		int	nIdx = `-1`
	)

Set the values of GPU memory to a specified value of type

float.

Parameters

nCount	Specifies the number of items to set.
hHandle	Specifies a handle to the memory on the GPU.
fVal	Specifies the value to set.
nIdx	When -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.

Definition at line 5909 of file CudaDnn.cs.

◆ set() [3/3]

void MyCaffe.common.CudaDnn< T >.set	(	int	nCount,
		long	hHandle,
		T	fVal,
		int	nIdx = `-1`,
		int	nXOff = `0`
	)

Set the values of GPU memory to a specified value of type 'T'.

Parameters

nCount	Specifies the number of items to set.
hHandle	Specifies a handle to the memory on the GPU.
fVal	Specifies the value to set.
nIdx	When -1, all values in the GPU memory are set to the fVal value, otherwise, only the value at the index nIdx is set to the value.
nXOff	Optionally specifies an offset into the GPU memory where the set starts.

Definition at line 5922 of file CudaDnn.cs.

◆ set_bounds()

void MyCaffe.common.CudaDnn< T >.set_bounds	(	int	n,
		double	dfMin,
		double	dfMax,
		long	hX
	)

Set the bounds of all items within the data to a set range of values.

Parameters

n	Specifies the number of items.
dfMin	Specifies the minimum value.
dfMax	Specifies the maximum value.
hX	Specifies a handle to the GPU data to be bound.

Definition at line 6732 of file CudaDnn.cs.

◆ SetConvolutionDesc()

void MyCaffe.common.CudaDnn< T >.SetConvolutionDesc	(	long	hHandle,
		int	hPad,
		int	wPad,
		int	hStride,
		int	wStride,
		int	hDilation,
		int	wDilation,
		bool	bUseTensorCores,
		bool	bHalf = `false`
	)

Set the values of a convolution descriptor.

Parameters

hHandle	Specifies the handle to the convolution descriptor.
hPad	Specifies the pad applied to the height.
wPad	Specifies the pad applied to the width.
hStride	Specifies the stride of the height.
wStride	Specifies the stride of the width.
hDilation	Specifies the dilation of the height (default = 1).
wDilation	Specifies the dilation of the width (default = 1).
bUseTensorCores	Optionally, specifies whether or not to use the Tensor Cores (if available).
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3785 of file CudaDnn.cs.

◆ SetDefaultCudaPath()

static void MyCaffe.common.CudaDnn< T >.SetDefaultCudaPath ( string strPath )

static

Used to optionally set the default path to the Low-Level Cuda Dnn DLL file.

Parameters

strPath Specifies the file path to the Low-Level Cuda Dnn DLL file to use.

Definition at line 1890 of file CudaDnn.cs.

◆ SetDeviceID()

void MyCaffe.common.CudaDnn< T >.SetDeviceID	(	int	nDeviceID = `-1`,
		DEVINIT	flags = `DEVINIT.NONE`,
		long?	lSeed = `null`
	)

Set the device ID used by the current instance of CudaDnn.

Parameters

nDeviceID	Specifies the zero-based device (GPU) id. When -1, the device ID is set to the device ID used to create the instance of CudaDnn.
flags	Optionally, specifies the initialization flags.
lSeed	Optionally, specifies the random number generator seed.

Definition at line 1960 of file CudaDnn.cs.

◆ SetDropoutDesc()

void MyCaffe.common.CudaDnn< T >.SetDropoutDesc	(	long	hCuDnn,
		long	hDropoutDesc,
		double	dfDropout,
		long	hStates,
		long	lSeed
	)

Set the dropout descriptor values.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hDropoutDesc	Specifies a handle to the dropout descriptor.
dfDropout	Specifies the droput probability (0.5 = 50%).
hStates	Specifies a handle to the state data in GPU memory.
lSeed	Specifies the random number-generator seed.

Definition at line 4237 of file CudaDnn.cs.

◆ SetFilterDesc()

void MyCaffe.common.CudaDnn< T >.SetFilterDesc	(	long	hHandle,
		int	n,
		int	c,
		int	h,
		int	w,
		bool	bHalf = `false`
	)

Sets the values of a filter descriptor.

Parameters

hHandle	Specifies the handle to the filter descriptor.
n	Specifies the number of items.
c	Specifies the number of channels in each item.
h	Specifies the height of each item.
w	Specifies the width of each item.
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3735 of file CudaDnn.cs.

◆ SetFilterNdDesc()

void MyCaffe.common.CudaDnn< T >.SetFilterNdDesc	(	long	hHandle,
		int[]	rgDim,
		bool	bHalf = `false`
	)

Sets the values of a filter descriptor.

Parameters

hHandle	Specifies the handle to the filter descriptor.
rgDim	Specifies the dimensions of the data.
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3700 of file CudaDnn.cs.

◆ SetHostMemory()

void MyCaffe.common.CudaDnn< T >.SetHostMemory	(	long	hMem,
		T[]	rgSrc
	)

Copies an array of type 'T' into a block of already allocated host memory.

Parameters

hMem	Specifies the handle to the host memory.
rgSrc	Specifies the array of type 'T' to copy.

Definition at line 2995 of file CudaDnn.cs.

◆ SetLRNDesc()

void MyCaffe.common.CudaDnn< T >.SetLRNDesc	(	long	hHandle,
		uint	nSize,
		double	fAlpha,
		double	fBeta,
		double	fK
	)

Set the LRN descriptor values.

Parameters

hHandle	Specifies a handle to an LRN descriptor.
nSize	Specifies the normalization window width. Default = 5.
fAlpha	Specifies the alpha variance. Caffe default = 1.0; cuDnn default = 1e-4.
fBeta	Specifies the beta power parameter. Caffe and cuDnn default = 0.75.
fK	Specifies the normalization 'k' parameter. Caffe default = 1.0; cuDnn default = 2.0.

Definition at line 4342 of file CudaDnn.cs.

◆ SetMemory() [1/5]

void MyCaffe.common.CudaDnn< T >.SetMemory	(	long	hMem,
		double[]	rgSrc,
		long	hStream = `0`
	)

Copies an array of double into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of double to copy.
hStream	Optionally specifies the stream to use for the copy operation.

Definition at line 2757 of file CudaDnn.cs.

◆ SetMemory() [2/5]

void MyCaffe.common.CudaDnn< T >.SetMemory	(	long	hMem,
		float[]	rgSrc,
		long	hStream = `0`
	)

Copies an array of float into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of float to copy.
hStream	Optionally specifies the stream to use for the copy operation.

Definition at line 2769 of file CudaDnn.cs.

◆ SetMemory() [3/5]

void MyCaffe.common.CudaDnn< T >.SetMemory	(	long	hMem,
		List< double >	rg
	)

Copies a list of doubles into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rg	Specifies the list of doubles to copy.

Definition at line 2734 of file CudaDnn.cs.

◆ SetMemory() [4/5]

void MyCaffe.common.CudaDnn< T >.SetMemory	(	long	hMem,
		List< float >	rg
	)

Copies a list of float into a block of already allocated GPU memory.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rg	Specifies the list of float to copy.

Definition at line 2745 of file CudaDnn.cs.

◆ SetMemory() [5/5]

void MyCaffe.common.CudaDnn< T >.SetMemory	(	long	hMem,
		T[]	rgSrc,
		long	hStream = `0`,
		int	nCount = `-1`
	)

Copies an array of type 'T' into a block of already allocated GPU memory.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of type 'T' to copy.
hStream	Optionally specifies the stream to use for the copy operation.
nCount	Optionally, specifies a count of items to retrieve.

Definition at line 2781 of file CudaDnn.cs.

◆ SetMemoryAt() [1/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt	(	long	hMem,
		double[]	rgSrc,
		int	nOffset
	)

Copies an array of double into a block of already allocated GPU memory starting at a specific offset.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of double to copy.
nOffset	Specifies offset within the GPU memory from where the copy is to start.

Definition at line 2860 of file CudaDnn.cs.

◆ SetMemoryAt() [2/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt	(	long	hMem,
		float[]	rgSrc,
		int	nOffset
	)

Copies an array of float into a block of already allocated GPU memory starting at a specific offset.

This function converts the input array into the base type 'T' for which the instance of CudaDnn was defined.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of float to copy.
nOffset	Specifies offset within the GPU memory from where the copy is to start.

Definition at line 2872 of file CudaDnn.cs.

◆ SetMemoryAt() [3/3]

void MyCaffe.common.CudaDnn< T >.SetMemoryAt	(	long	hMem,
		T[]	rgSrc,
		int	nOffset
	)

Copies an array of type 'T' into a block of already allocated GPU memory starting at a specific offset.

Parameters

hMem	Specifies the handle to the GPU memory.
rgSrc	Specifies the array of type 'T' to copy.
nOffset	Specifies offset within the GPU memory from where the copy is to start.

Definition at line 2883 of file CudaDnn.cs.

◆ SetPixel()

T[] MyCaffe.common.CudaDnn< T >.SetPixel	(	long	hMem,
		int	nCount,
		bool	bReturnOriginal,
		int	nOffset,
		params Tuple< int, T >[]	rgPixel
	)

Set a pixel value where each pixel is defined a set index, value tuple.

Parameters

hMem	Specifies the memory where the values are set.
nCount	Specifies the number of allocated items in the memory.
bReturnOriginal	Specifies whether or not to return the original values (before setting).
nOffset	Specifies the offset of where the first pixel data starts.
rgPixel	Specifies the pixel values.

Returns: When 'bReturnOriginal' is True, the original values (before setting) are returned.

Definition at line 2933 of file CudaDnn.cs.

◆ SetPoolingDesc()

void MyCaffe.common.CudaDnn< T >.SetPoolingDesc	(	long	hHandle,
		PoolingMethod	method,
		int	h,
		int	w,
		int	hPad,
		int	wPad,
		int	hStride,
		int	wStride
	)

Set the values of a pooling descriptor.

Parameters

hHandle	Specifies the handle to the convolution descriptor.
method	Specifies the pooling method to use.
h	Specifies the pooling area height.
w	Specifies the pooling area width.
hPad	Specifies the height padding.
wPad	Specifies the width padding.
hStride	Specifies the height stride.
wStride	Specifies the width stride.

Definition at line 4074 of file CudaDnn.cs.

◆ SetRandomSeed()

void MyCaffe.common.CudaDnn< T >.SetRandomSeed ( long lSeed )

Set the random number generator seed.

Parameters

lSeed Specifies the seed to set.

Definition at line 1990 of file CudaDnn.cs.

◆ SetRnn8()

void MyCaffe.common.CudaDnn< T >.SetRnn8	(	long	hCuDnn,
		long	hRnn,
		bool	bTraining,
		RNN_DATALAYOUT	layout,
		RNN_MODE	cellMode,
		RNN_BIAS_MODE	biasMode,
		int	nSequenceLen,
		int	nBatchSize,
		int	nInputs,
		int	nHidden,
		int	nOutputs,
		int	nProjection,
		int	nNumLayers,
		float	fDropout,
		ulong	lSeed,
		bool	bBidirectional = `false`
	)

Set the RNN8 parameters.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnn	Specifies the handle to the RNN8 created with CreateRnn8.
bTraining	Specifies true for training and false for inference.
layout	Specifies the data layout ordering.
cellMode	Specifies the cell mode (RELU, TANH, LSTM or GRU),
biasMode	Specifies the bias mode (default = RNN_DOUBLE_BIAS)
nSequenceLen	Specifies the sequence length.
nBatchSize	Specifies the batch size.
nInputs	Specifies the number of inputs. X input is of size (SeqLen, BatchSize, Inputs)
nHidden	Specifies the number of hidden. H and C are of size (BatchSize, Hidden)
nOutputs	Specifies the number of outputs. Y output is of size (SeqLen, BatchSize, Outputs)
nProjection	Specifies the projection size.
nNumLayers	Specifies the number of layers.
fDropout	Specifies the dropout ratio.
lSeed	Specifies the dropout seed.
bBidirectional	Specifies unidirectional (false) or bidirectional (true), (default = false)

Definition at line 5205 of file CudaDnn.cs.

◆ SetRnnDataDesc()

void MyCaffe.common.CudaDnn< T >.SetRnnDataDesc	(	long	hRnnDataDesc,
		RNN_DATALAYOUT	layout,
		int	nMaxSeqLen,
		int	nBatchSize,
		int	nVectorSize,
		bool	bBidirectional = `false`,
		int[]	rgSeqLen = `null`
	)

Sets the RNN Data Descriptor values.

Parameters

hRnnDataDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
layout	Specifies the input data layout (either SEQUENCE major or BATCH major).
nMaxSeqLen	Specifies the maximum sequence length.
nBatchSize	Specifies the batch count.
nVectorSize	Specifies the input vector count.
bBidirectional	Specifies whether the Rnn is bidirectional or not (default = false).
rgSeqLen	Specifies the sequence lengths - currently this should be null which sets all sequence lengths to nMaxSeqLen.

Definition at line 4692 of file CudaDnn.cs.

◆ SetRnnDesc()

void MyCaffe.common.CudaDnn< T >.SetRnnDesc	(	long	hCuDnn,
		long	hRnnDesc,
		int	nHiddenCount,
		int	nNumLayers,
		long	hDropoutDesc,
		RNN_MODE	mode,
		bool	bUseTensorCores,
		RNN_DIRECTION	direction = `RNN_DIRECTION.RNN_UNIDIRECTIONAL`
	)

Sets the RNN Descriptor values.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
hRnnDesc	Specifies the handle to the RNN descriptor created with CreateRnnDesc
nHiddenCount	Specifies the hidden input (typically the input) count.
nNumLayers	Specifies the number of layers.
hDropoutDesc	Specifies the handle to the Droput descriptor (or 0 to ignore). The droput descriptor is only used with two or more layers.
mode	Specifies the RNN_MODE (LSTM, RNN_RELU, RNN_TANH) to use.
bUseTensorCores	Optionally, specifies whether or not to use the Tensor Cores (if available).
direction	Optionally, specifies the direction of the RNN; Unidirectional or BiDirectional.

Definition at line 4770 of file CudaDnn.cs.

◆ SetTensorDesc() [1/2]

void MyCaffe.common.CudaDnn< T >.SetTensorDesc	(	long	hHandle,
		int	n,
		int	c,
		int	h,
		int	w,
		bool	bHalf = `false`
	)

Sets the values of a tensor descriptor.

Parameters

hHandle	Specifies the handle to the tensor descriptor.
n	Specifies the number of items.
c	Specifies the number of channels in each item.
h	Specifies the height of each item.
w	Specifies the width of each item.
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3599 of file CudaDnn.cs.

◆ SetTensorDesc() [2/2]

void MyCaffe.common.CudaDnn< T >.SetTensorDesc	(	long	hHandle,
		int	n,
		int	c,
		int	h,
		int	w,
		int	nStride,
		int	cStride,
		int	hStride,
		int	wStride,
		bool	bHalf = `false`
	)

Sets the values of a tensor descriptor.

Parameters

hHandle	Specifies the handle to the tensor descriptor.
n	Specifies the number of items.
c	Specifies the number of channels in each item.
h	Specifies the height of each item.
w	Specifies the width of each item.
nStride	Specifies the stride between two images.
cStride	Specifies the stride between two channels.
hStride	Specifies the stride between two rows.
wStride	Specifies the stride between two columns.
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3620 of file CudaDnn.cs.

◆ SetTensorNdDesc()

void MyCaffe.common.CudaDnn< T >.SetTensorNdDesc	(	long	hHandle,
		int[]	rgDim,
		int[]	rgStride,
		bool	bHalf = `false`
	)

Sets the values of a tensor descriptor.

Parameters

hHandle	Specifies the handle to the tensor descriptor.
rgDim	Specifies the dimensions of the data.
rgStride	Specifies the stride of the data.
bHalf	Optionally, specifies whether or not to use the FP16 half data type.

Definition at line 3551 of file CudaDnn.cs.

◆ SetupSSD()

void MyCaffe.common.CudaDnn< T >.SetupSSD	(	long	hSSD,
		int	nNum,
		int	nNumPriors,
		int	nNumGt
	)

Setup the SSD GPU support.

Parameters

hSSD	Specifies the handle to the SSD instance.
nNum	Specifies the number of items.
nNumPriors	Specifies the number of priors.
nNumGt	Specifies the number of ground truths.

Definition at line 5625 of file CudaDnn.cs.

◆ sgd_update()

void MyCaffe.common.CudaDnn< T >.sgd_update	(	int	nCount,
		long	hNetParamsDiff,
		long	hHistoryData,
		T	fMomentum,
		T	fLocalRate
	)

Perform the Stochastic Gradient Descent (SGD) update

See Stochastic Gradient Descent.

Parameters

nCount	Specifies the number of items.
hNetParamsDiff	Specifies a handle to the net params diff in GPU memory.
hHistoryData	Specifies a handle to the history data in GPU memory.
fMomentum	Specifies the momentum value.
fLocalRate	Specifies the local learning rate.

Definition at line 10203 of file CudaDnn.cs.

◆ sigmoid_bwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff
	)

Performs a Sigmoid backward pass in Cuda.

See also: Sigmoid Function.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9341 of file CudaDnn.cs.

◆ sigmoid_cross_entropy_bwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_cross_entropy_bwd	(	int	nCount,
		int	nIgnoreLabel,
		long	hTarget,
		long	hBottomDiff
	)

Performs a sigmoid cross entropy backward pass in Cuda when an ignore label is specified.

Parameters

nCount	Specifies the number of items.
nIgnoreLabel	Specifies the label to ignore.
hTarget	Specifies a handle to the target data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 10571 of file CudaDnn.cs.

◆ sigmoid_cross_entropy_fwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_cross_entropy_fwd	(	int	nCount,
		long	hInput,
		long	hTarget,
		long	hLoss,
		bool	bHasIgnoreLabel,
		int	nIgnoreLabel,
		long	hCountData
	)

Performs a sigmoid cross entropy forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hInput	Specifies a handle to the input data in GPU memory.
hTarget	Specifies a handle to the target data in GPU memory.
hLoss	Specifies a handle to the loss data in GPU memory.
bHasIgnoreLabel	Specifies whether or not an ignore label is used.
nIgnoreLabel	Specifies the ignore label which is used when bHasIgnoreLabel is true
hCountData	Specifies a handle to the count data in GPU memory.

Definition at line 10556 of file CudaDnn.cs.

◆ sigmoid_fwd()

void MyCaffe.common.CudaDnn< T >.sigmoid_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs a Sigmoid forward pass in Cuda.

Calcuation

See also: Sigmoid Function.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9323 of file CudaDnn.cs.

◆ SigmoidBackward()

void MyCaffe.common.CudaDnn< T >.SigmoidBackward	(	long	hCuDnn,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a Sigmoid backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4553 of file CudaDnn.cs.

◆ SigmoidForward()

void MyCaffe.common.CudaDnn< T >.SigmoidForward	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Perform a Sigmoid forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4531 of file CudaDnn.cs.

◆ sign()

void MyCaffe.common.CudaDnn< T >.sign	(	int	n,
		long	hX,
		long	hY,
		int	nXOff = `0`,
		int	nYOff = `0`
	)

Computes the sign of each element of X and places the result in Y.

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nXOff	Specifies an offset (in items, not bytes) into the memory of X.
nYOff	Specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7574 of file CudaDnn.cs.

◆ silu_bwd()

void MyCaffe.common.CudaDnn< T >.silu_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData
	)

Performs the Sigmoid-weighted Linear Unit (SiLU) activation backward pass in Cuda.

Computes the SiLU non-linearity $ y = x * sigmoid(x) $ $ y' = sigmoid(x) * (1 + x * (1 - sigmoid(x)) $

See also: Brief Review - SiLU: Sigmoid-weighted Linear Unit by Sik-Ho Tsang, 2022, Medium.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.

Definition at line 9140 of file CudaDnn.cs.

◆ silu_fwd()

void MyCaffe.common.CudaDnn< T >.silu_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs the Sigmoid-weighted Linear Unit (SiLU) activation forward pass in Cuda.

Computes the SiLU non-linearity $ y = x * sigmoid(x) $ $ y' = sigmoid(x) * (1 + x * (1 - sigmoid(x)) $

See also: Brief Review - SiLU: Sigmoid-weighted Linear Unit by Sik-Ho Tsang, 2022, Medium.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9118 of file CudaDnn.cs.

◆ slice_bwd()

void MyCaffe.common.CudaDnn< T >.slice_bwd	(	int	nCount,
		long	hTopDiff,
		int	nNumSlices,
		int	nSliceSize,
		int	nBottomSliceAxis,
		int	nTopSliceAxis,
		int	nOffsetSliceAxis,
		long	hBottomDiff
	)

Performs a slice backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nNumSlices	Specifies the number of slices.
nSliceSize	Specifies the slice size.
nBottomSliceAxis	Specifies the bottom axis to concatenate.
nTopSliceAxis	*NEEDS REVIEW*
nOffsetSliceAxis	*NEEDS REVIEW*
hBottomDiff	Specifies a handle to the Bottom diff in GPU memory.

Definition at line 9907 of file CudaDnn.cs.

◆ slice_fwd()

void MyCaffe.common.CudaDnn< T >.slice_fwd	(	int	nCount,
		long	hBottomData,
		int	nNumSlices,
		int	nSliceSize,
		int	nBottomSliceAxis,
		int	nTopSliceAxis,
		int	nOffsetSliceAxis,
		long	hTopData
	)

Performs a slice forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
nNumSlices	Specifies the number of slices.
nSliceSize	Specifies the slice size.
nBottomSliceAxis	*NEEDS REVIEW*
nTopSliceAxis	*NEEDS REVIEW*
nOffsetSliceAxis	*NEEDS REVIEW*
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9888 of file CudaDnn.cs.

◆ smoothl1_bwd()

void MyCaffe.common.CudaDnn< T >.smoothl1_bwd	(	int	nCount,
		long	hX,
		long	hY
	)

Performs the backward operation for the SmoothL1 loss.

Calculation: f'(x) = x, if |x| lt 1 = sign(x), otherwise

Parameters

nCount	Specifies the number of items.
hX	Specifies the input data X in GPU memory.
hY	Specifies the output data Y in GPU memory.

Definition at line 10063 of file CudaDnn.cs.

◆ smoothl1_fwd()

void MyCaffe.common.CudaDnn< T >.smoothl1_fwd	(	int	nCount,
		long	hX,
		long	hY
	)

Performs the forward operation for the SmoothL1 loss.

Calculation: f(x) = 0.5 * x^2, if |x| lt 1 = |x| - 0.5, otherwise

Parameters

nCount	Specifies the number of items.
hX	Specifies the input data X in GPU memory.
hY	Specifies the output data Y in GPU memory.

Definition at line 10044 of file CudaDnn.cs.

◆ softmax_cross_entropy_bwd()

void MyCaffe.common.CudaDnn< T >.softmax_cross_entropy_bwd	(	int	nCount,
		int	nIgnoreLabel,
		long	hTarget,
		long	hBottomDiff
	)

Performs a softmax cross entropy backward pass in Cuda when an ignore label is specified.

Parameters

nCount	Specifies the number of items.
nIgnoreLabel	Specifies the label to ignore.
hTarget	Specifies a handle to the target data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 10624 of file CudaDnn.cs.

◆ softmax_cross_entropy_fwd()

void MyCaffe.common.CudaDnn< T >.softmax_cross_entropy_fwd	(	int	nCount,
		long	hProbData,
		long	hLabel,
		long	hLossDiff,
		long	hLossData,
		int	nOuterNum,
		int	nDim,
		int	nInnerNum,
		long	hCounts,
		int?	nIgnoreLabel
	)

Performs a softmax cross entropy forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hProbData	Specifies a handle to the probability data in GPU memory.
hLabel	Specifies a handle to the label data in GPU memory.
hLossDiff	Specifies a handle to the loss diff in GPU memory that is filled with 1's at each 'active' location where loss data is placed.
hLossData	Specifies a handle to the loss data in GPU memory.
nOuterNum	*NEEDS REVIEW*
nDim	*NEEDS REVIEW*
nInnerNum	*NEEDS REVIEW*
hCounts	Specifies a handle to the counts in GPU memory.
nIgnoreLabel	Optionally, specifies a label to ignore.

This forward pass is a helper to perform a part of the NLLLoss portion of the SoftmaxCrossEntropyLoss.

Definition at line 10595 of file CudaDnn.cs.

◆ SoftmaxBackward()

void MyCaffe.common.CudaDnn< T >.SoftmaxBackward	(	long	hCuDnn,
		SOFTMAX_ALGORITHM	alg,
		SOFTMAX_MODE	mode,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a Softmax backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
alg	Specifies the SoftmaxAlgorithm to use (FAST, ACCURATE or LOG).
mode	Specifies the SoftmaxMode to use (INSTANCE across NxCHW, or CHANNEL across NCxHW)
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor.
hTopDiff	Specifies a handle to the top diff in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4640 of file CudaDnn.cs.

◆ SoftmaxForward()

void MyCaffe.common.CudaDnn< T >.SoftmaxForward	(	long	hCuDnn,
		SOFTMAX_ALGORITHM	alg,
		SOFTMAX_MODE	mode,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Perform a Softmax forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
alg	Specifies the SoftmaxAlgorithm to use (FAST, ACCURATE or LOG).
mode	Specifies the SoftmaxMode to use (INSTANCE across NxCHW, or CHANNEL across NCxHW)
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4618 of file CudaDnn.cs.

◆ softmaxloss_bwd()

void MyCaffe.common.CudaDnn< T >.softmaxloss_bwd	(	int	nCount,
		long	hTopData,
		long	hLabel,
		long	hBottomDiff,
		int	nOuterNum,
		int	nDim,
		int	nInnerNum,
		long	hCounts,
		int?	nIgnoreLabel
	)

Performs Softmax Loss backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopData	Specifies a handle to the top data in GPU memory.
hLabel	Specifies a handle to the label data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
nOuterNum	*NEEDS REVIEW*
nDim	*NEEDS REVIEW*
nInnerNum	*NEEDS REVIEW*
hCounts	Specifies a handle to the counts in GPU memory.
nIgnoreLabel	Optionally, specifies a label to ignore.

Definition at line 9639 of file CudaDnn.cs.

◆ softmaxloss_fwd()

void MyCaffe.common.CudaDnn< T >.softmaxloss_fwd	(	int	nCount,
		long	hProbData,
		long	hLabel,
		long	hLossData,
		int	nOuterNum,
		int	nDim,
		int	nInnerNum,
		long	hCounts,
		int?	nIgnoreLabel
	)

Performs Softmax Loss forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hProbData	Specifies a handle to the probability data in GPU memory.
hLabel	Specifies a handle to the label data in GPU memory.
hLossData	Specifies a handle to the loss data in GPU memory.
nOuterNum	*NEEDS REVIEW*
nDim	*NEEDS REVIEW*
nInnerNum	*NEEDS REVIEW*
hCounts	Specifies a handle to the counts in GPU memory.
nIgnoreLabel	Optionally, specifies a label to ignore.

Definition at line 9605 of file CudaDnn.cs.

◆ softplus_bwd()

void MyCaffe.common.CudaDnn< T >.softplus_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff,
		long	hBottomData
	)

Performs the Softplus function backward, a smooth approximation of the ReLU function

Computes the SoftPlus non-linearity $ y = log(1 + e^x) $ $ y' = sigmoid(x) $

See also: Softplus function - Smooth approximation of the ReLU function by neuralthreds, 2021, Medium.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hBottomData	Specifies a handle tot he bottom data in GPU memory.

Definition at line 9183 of file CudaDnn.cs.

◆ softplus_fwd()

void MyCaffe.common.CudaDnn< T >.softplus_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs the Softplus function forward, a smooth approximation of the ReLU function

Computes the SoftPlus non-linearity $ y = log(1 + e^x) $ $ y' = sigmoid(x) $

See also: Softplus function - Smooth approximation of the ReLU function by neuralthreds, 2021, Medium.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9161 of file CudaDnn.cs.

◆ sort()

void MyCaffe.common.CudaDnn< T >.sort	(	int	nCount,
		long	hY
	)

Sort the data in the GPU memory specified.

Parameters

nCount	Specifies the total number of items in the memory.
hY	Specifies the handle to the GPU memory of data to sort.

Definition at line 6212 of file CudaDnn.cs.

◆ sqrt()

void MyCaffe.common.CudaDnn< T >.sqrt	(	int	n,
		long	hX,
		long	hY
	)

Computes the square root of each element of X and places the result in Y.

Parameters

n	Specifies the number of items (not bytes) in the vectors A and Y.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7624 of file CudaDnn.cs.

◆ sqrt_scale()

void MyCaffe.common.CudaDnn< T >.sqrt_scale	(	int	nCount,
		long	hX,
		long	hY
	)

Scale the data by the sqrt of the data. y = sqrt(abs(x)) * sign(x)

Parameters

nCount	Specifies the number of elements.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 7638 of file CudaDnn.cs.

◆ SsdEncodeConfPrediction()

void MyCaffe.common.CudaDnn< T >.SsdEncodeConfPrediction	(	long	hSSD,
		int	nConfPredCount,
		long	hConfPred,
		int	nConfGtCount,
		long	hConfGt
	)

Encodes the SSD data into the confidence prediction and confidence ground truths.

Parameters

hSSD	Specifies the handle to the SSD instance.
nConfPredCount	Specifies the number of confidence prediction items.
hConfPred	Specifies the confidence prediction data in GPU memory.
nConfGtCount	Specifies the confidence ground truth items.
hConfGt	Specifies the confidence ground truth data in GPU memory.

Definition at line 5810 of file CudaDnn.cs.

◆ SsdEncodeLocPrediction()

void MyCaffe.common.CudaDnn< T >.SsdEncodeLocPrediction	(	long	hSSD,
		int	nLocPredCount,
		long	hLocPred,
		int	nLocGtCount,
		long	hLocGt
	)

Encodes the SSD data into the location prediction and location ground truths.

Parameters

hSSD	Specifies the handle to the SSD instance.
nLocPredCount	Specifies the number of location prediction items.
hLocPred	Specifies the location prediction data in GPU memory.
nLocGtCount	Specifies the location ground truth items.
hLocGt	Specifies the location ground truth data in GPU memory.

Definition at line 5794 of file CudaDnn.cs.

◆ SsdMultiBoxLossForward()

int MyCaffe.common.CudaDnn< T >.SsdMultiBoxLossForward	(	long	hSSD,
		int	nLocDataCount,
		long	hLocGpuData,
		int	nConfDataCount,
		long	hConfGpuData,
		int	nPriorDataCount,
		long	hPriorGpuData,
		int	nGtDataCount,
		long	hGtGpuData,
		out List< DictionaryMap< List< int > > >	rgAllMatchIndices,
		out List< List< int > >	rgrgAllNegIndices,
		out int	nNumNegs
	)

Performs the SSD MultiBoxLoss forward operation.

Parameters

hSSD	Specifies the handle to the SSD instance.
nLocDataCount	Specifies the number of location data items.
hLocGpuData	Specifies the handle to the location data in GPU memory.
nConfDataCount	Specifies the number of confidence data items.
hConfGpuData	Specifies the handle to the confidence data in GPU memory.
nPriorDataCount	Specifies the number of prior box data.
hPriorGpuData	Specifies the prior box data in GPU memory.
nGtDataCount	Specifies the number of ground truth items.
hGtGpuData	Specifies the ground truth data in GPU memory.
rgAllMatchIndices	Returns all match indices found.
rgrgAllNegIndices	Returns all neg indices found.
nNumNegs	Returns the number of negatives.

Returns: The number of matches is returned.

Definition at line 5661 of file CudaDnn.cs.

◆ sub()

void MyCaffe.common.CudaDnn< T >.sub	(	int	n,
		long	hA,
		long	hB,
		long	hY,
		int	nAOff = `0`,
		int	nBOff = `0`,
		int	nYOff = `0`,
		int	nB = `0`
	)

Subtracts B from A and places the result in Y.

Y = A - B

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A.
nBOff	Optionally, specifies an offset (in items, not bytes) into the memory of B.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.
nB	Optionally, specifies a number of 'B' items to subtract (default = 0 which causes ALL items in B to be subtracted). When 'nB' > 0, it must be a factor of 'n' and causes that number of B items to be subtracted as a block from A.

Definition at line 7312 of file CudaDnn.cs.

◆ sub_and_dot()

void MyCaffe.common.CudaDnn< T >.sub_and_dot	(	int	n,
		int	nN,
		int	nInnerNum,
		long	hA,
		long	hB,
		long	hY,
		int	nAOff,
		int	nBOff,
		int	nYOff
	)

Subtracts every nInnterNum element of B from A and performs a dot product on the result.

Y[i] = (A[i] - B[inInnerNum]) * (A[i] - B[inInnerNum])

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and Y.
nN	Specifies the inner count.
nInnerNum	Specifies the dimension.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.
nAOff	Optionally, specifies an offset (in items, not bytes) into the memory of A.
nBOff	Optionally, specifies an offset (in items, not bytes) into the memory of B.
nYOff	Optionally, specifies an offset (in items, not bytes) into the memory of Y.

Definition at line 7357 of file CudaDnn.cs.

◆ sum()

void MyCaffe.common.CudaDnn< T >.sum	(	int	nCount,
		int	nOuterNum,
		int	nInnerNum,
		long	hX,
		long	hY
	)

Calculates the sum of inner values of X and places the result in Y.

Parameters

nCount	Specifies the number of elements in X.
nOuterNum	Specifies the number of outer items within X.
nInnerNum	Specifies the dimension of items to sum in X.
hX	Specifies a handle to the vector X in GPU memory.
hY	Specifies a handle to the vector Y in GPU memory.

Definition at line 8491 of file CudaDnn.cs.

◆ sumsq()

double MyCaffe.common.CudaDnn< T >.sumsq	(	int	n,
		long	hW,
		long	hA,
		int	nAOff = `0`
	)

Calculates the sum of squares of A.

Parameters

n	Specifies the number of items (not bytes) in the vectors A and W.
hW	Specifies a handle to workspace data in GPU memory.
hA	Specifies a handle to the vector A in GPU memory.
nAOff	Specifies an offset (in items, not bytes) into the memory of A.

Returns: The sum of squares of A is returned as type
double

Definition at line 7878 of file CudaDnn.cs.

◆ sumsqdiff()

double MyCaffe.common.CudaDnn< T >.sumsqdiff	(	int	n,
		long	hW,
		long	hA,
		long	hB,
		int	nAOff = `0`,
		int	nBOff = `0`
	)

Calculates the sum of squares of differences between A and B

Parameters

n	Specifies the number of items (not bytes) in the vectors A, B and W.
hW	Specifies a handle to workspace data in GPU memory.
hA	Specifies a handle to the vector A in GPU memory.
hB	Specifies a handle to the vector B in GPU memory.
nAOff	Specifies an offset (in items, not bytes) into the memory of A.
nBOff	Specifies an offset (in items, not bytes) into the memory of B.

Returns: The sum of squared differences between A and B are returned as type
double

Definition at line 7902 of file CudaDnn.cs.

◆ swish_bwd()

void MyCaffe.common.CudaDnn< T >.swish_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hSigmoidOutputData,
		long	hBottomDiff,
		double	dfBeta
	)

Performs a Swish backward pass in Cuda.

See also: Activation Functions by Prajit Ramachandran, Barret Zoph, Quoc V. Le., 2017.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hSigmoidOutputData	Specifies a handle to the sigmoid output data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
dfBeta	Specifies the 'beta' value applied to the output.

Definition at line 9361 of file CudaDnn.cs.

◆ SynchronizeDevice()

void MyCaffe.common.CudaDnn< T >.SynchronizeDevice ( )

Synchronize the operations on the current device.

Definition at line 2093 of file CudaDnn.cs.

◆ SynchronizeStream()

void MyCaffe.common.CudaDnn< T >.SynchronizeStream ( long h = 0 )

Synchronize a stream on the current GPU, waiting for its operations to complete.

Parameters

h	Specifies the handle to the stream.

Definition at line 3239 of file CudaDnn.cs.

◆ SynchronizeThread()

void MyCaffe.common.CudaDnn< T >.SynchronizeThread ( )

Synchronize all kernel threads on the current GPU.

Definition at line 3250 of file CudaDnn.cs.

◆ tanh_bwd()

void MyCaffe.common.CudaDnn< T >.tanh_bwd	(	int	nCount,
		long	hTopDiff,
		long	hTopData,
		long	hBottomDiff
	)

Performs a TanH backward pass in Cuda.

See also: Hyperbolic Function.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 9304 of file CudaDnn.cs.

◆ tanh_fwd()

void MyCaffe.common.CudaDnn< T >.tanh_fwd	(	int	nCount,
		long	hBottomData,
		long	hTopData
	)

Performs a TanH forward pass in Cuda.

Calculation

See also: Hyperbolic Function.

Parameters

nCount	Specifies the number of items in the bottom and top data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9286 of file CudaDnn.cs.

◆ TanhBackward()

void MyCaffe.common.CudaDnn< T >.TanhBackward	(	long	hCuDnn,
		T	fAlpha,
		long	hTopDataDesc,
		long	hTopData,
		long	hTopDiffDesc,
		long	hTopDiff,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hBottomDiffDesc,
		long	hBottomDiff
	)

Perform a Tanh backward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.
hTopDiffDesc	Specifies a handle to the top diff tensor descriptor
hTopDiff	Specifies a handle to the top diff in GPU memory.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hBottomDiffDesc	Specifies a handle to the bottom diff tensor descriptor.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.

Definition at line 4473 of file CudaDnn.cs.

◆ TanhForward()

void MyCaffe.common.CudaDnn< T >.TanhForward	(	long	hCuDnn,
		T	fAlpha,
		long	hBottomDataDesc,
		long	hBottomData,
		T	fBeta,
		long	hTopDataDesc,
		long	hTopData
	)

Perform a Tanh forward pass.

Parameters

hCuDnn	Specifies a handle to the instance of cuDnn.
fAlpha	Specifies a scaling factor applied to the result.
hBottomDataDesc	Specifies a handle to the bottom data tensor descriptor.
hBottomData	Specifies a handle to the bottom data in GPU memory.
fBeta	Specifies a scaling factor applied to the prior destination value.
hTopDataDesc	Specifies a handle to the top data tensor descriptor.
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 4451 of file CudaDnn.cs.

◆ threshold_fwd()

void MyCaffe.common.CudaDnn< T >.threshold_fwd	(	int	nCount,
		double	dfThreshold,
		long	hX,
		long	hY
	)

Performs a threshold pass in Cuda.

Calculation:

Parameters

nCount	Specifies the number of items.
dfThreshold	Specifies the threshold value.
hX	Specifies the input data X in GPU memory.
hY	Specifies the output data Y in GPU memory.

Definition at line 10001 of file CudaDnn.cs.

◆ tile_bwd()

void MyCaffe.common.CudaDnn< T >.tile_bwd	(	int	nCount,
		long	hTopDiff,
		int	nTileSize,
		int	nTiles,
		int	nBottomTileAxis,
		long	hBottomDiff
	)

Performs a tile backward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hTopDiff	Specifies a handle to the top diff in GPU memory.
nTileSize	Specifies the size of each tile.
nTiles	Specifies the number of tiles.
nBottomTileAxis	*NEEDS REVIEW*
hBottomDiff	Specifies a handle to the Bottom diff in GPU memory.

Definition at line 9941 of file CudaDnn.cs.

◆ tile_fwd()

void MyCaffe.common.CudaDnn< T >.tile_fwd	(	int	nCount,
		long	hBottomData,
		int	nInnerDim,
		int	nTiles,
		int	nBottomTileAxis,
		long	hTopData
	)

Performs a tile forward pass in Cuda.

Parameters

nCount	Specifies the number of items.
hBottomData	Specifies a handle to the Bottom data in GPU memory.
nInnerDim	*NEEDS REVIEW*
nTiles	Specifies the number of tiles.
nBottomTileAxis	*NEEDS REVIEW*
hTopData	Specifies a handle to the top data in GPU memory.

Definition at line 9924 of file CudaDnn.cs.

◆ transpose()

void MyCaffe.common.CudaDnn< T >.transpose	(	int	n,
		long	hX,
		long	hY,
		long	hXCounts,
		long	hYCounts,
		long	hMapping,
		int	nNumAxes,
		long	hBuffer
	)

Perform a transpose on X producing Y, similar to the numpy.transpose operation.

Parameters

n	Specifies the number of items in both hX and hY (must be the same).
hX	Specifies a handle to the input data in gpu memory.
hY	Specifies a handle to the output data in gpu memory.
hXCounts	Specifies a handle to the input counts in gpu memory.
hYCounts	Specifies a handle to the output counts in gpu memory.
hMapping	Specifies a handle to the mappings of each axis.
nNumAxes	Specifies the number of axes.
hBuffer	Specifies a handle to the buffer that should have 'n' * nNumAxes number of items.

Definition at line 7862 of file CudaDnn.cs.

◆ transposeHW()

void MyCaffe.common.CudaDnn< T >.transposeHW	(	int	n,
		int	c,
		int	h,
		int	w,
		long	hSrc,
		long	hDst
	)

Transpose a n*c number of matrices along the height and width dimensions. All matrices are in row-major format.

Parameters

n	Specifies the number of items (e.g. batches)
c	Specifies the number of channels.
h	Specifies the height.
w	Specifies the width.
hSrc	Specifies a handle to GPU memory of shape (n,c,h,w)
hDst	Specifies a handle to GPU memory of shape (n,c,w,h)

Definition at line 6716 of file CudaDnn.cs.

◆ unpooling_bwd()

void MyCaffe.common.CudaDnn< T >.unpooling_bwd	(	POOLING_METHOD	method,
		int	nCount,
		long	hTopDiff,
		int	num,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nPooledHeight,
		int	nPooledWidth,
		int	nKernelH,
		int	nKernelW,
		int	nStrideH,
		int	nStrideW,
		int	nPadH,
		int	nPadW,
		long	hBottomDiff,
		long	hMask
	)

Performs the backward pass for unpooling using Cuda

Parameters

method	Specifies the pooling method.
nCount	Specifies the number of items in the bottom data.
hTopDiff	Specifies a handle to the top diff in GPU memory.
num	Specifies the number of inputs.
nChannels	Specifies the number of channels per input.
nHeight	Specifies the height of each input.
nWidth	Specifies the width of each input.
nPooledHeight	Specifies the height of the pooled data.
nPooledWidth	Specifies the width of the pooled data.
nKernelH	Specifies the height of the pooling kernel.
nKernelW	Specifies the width of the pooling kernel.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
hBottomDiff	Specifies a handle to the bottom diff in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.

Definition at line 8895 of file CudaDnn.cs.

◆ unpooling_fwd()

void MyCaffe.common.CudaDnn< T >.unpooling_fwd	(	POOLING_METHOD	method,
		int	nCount,
		long	hBottomData,
		int	num,
		int	nChannels,
		int	nHeight,
		int	nWidth,
		int	nPooledHeight,
		int	nPooledWidth,
		int	nKernelH,
		int	nKernelW,
		int	nStrideH,
		int	nStrideW,
		int	nPadH,
		int	nPadW,
		long	hTopData,
		long	hMask
	)

Performs the forward pass for unpooling using Cuda

Parameters

method	Specifies the pooling method.
nCount	Specifies the number of items in the bottom data.
hBottomData	Specifies a handle to the bottom data in GPU memory.
num	Specifies the number of inputs.
nChannels	Specifies the number of channels per input.
nHeight	Specifies the height of each input.
nWidth	Specifies the width of each input.
nPooledHeight	Specifies the height of the pooled data.
nPooledWidth	Specifies the width of the pooled data.
nKernelH	Specifies the height of the pooling kernel.
nKernelW	Specifies the width of the pooling kernel.
nStrideH	Specifies the stride along the height.
nStrideW	Specifies the stride along the width.
nPadH	Specifies the pad applied to the height.
nPadW	Specifies the pad applied to the width.
hTopData	Specifies a handle to the top data in GPU memory.
hMask	Specifies a handle to the mask data in GPU memory.

Definition at line 8867 of file CudaDnn.cs.

◆ width()

void MyCaffe.common.CudaDnn< T >.width	(	int	n,
		long	hMean,
		long	hMin,
		long	hMax,
		double	dfAlpha,
		long	hWidth
	)

Calculates the width values.

Parameters

n	Specifies the number of items.
hMean	Specifies a handle to the mean values in GPU memory.
hMin	Specifies a handle to the min values in GPU memory.
hMax	Specifies a handle to the max values in GPU memory.
dfAlpha	Specifies the alpha value.
hWidth	Specifies the GPU memory where the width values are placed.

Definition at line 7925 of file CudaDnn.cs.

Property Documentation

◆ BaseSize

ulong MyCaffe.common.CudaDnn< T >.BaseSize

staticget

Returns the base data type size (e.g. float= 4, double = 8).

Definition at line 2428 of file CudaDnn.cs.

◆ DefaultPath

string MyCaffe.common.CudaDnn< T >.DefaultPath

staticget

Specifies the default path used t load the Low-Level Cuda DNN Dll file.

Definition at line 1931 of file CudaDnn.cs.

◆ KernelHandle

long MyCaffe.common.CudaDnn< T >.KernelHandle

get

Returns the Low-Level kernel handle used for this instance. Each Low-Level kernel maintains its own set of look-up tables for memory, streams, cuDnn constructs, etc.

Definition at line 1811 of file CudaDnn.cs.

◆ OriginalDeviceID

int MyCaffe.common.CudaDnn< T >.OriginalDeviceID

get

Returns the original device ID used to create the instance of CudaDnn.

Definition at line 2004 of file CudaDnn.cs.

◆ Path

string MyCaffe.common.CudaDnn< T >.Path

get

Specifies the file path used to load the Low-Level Cuda DNN Dll file.

Definition at line 1923 of file CudaDnn.cs.

◆ TotalMemoryUsed

ulong MyCaffe.common.CudaDnn< T >.TotalMemoryUsed

get

Returns the total amount of GPU memory used by this instance.

Definition at line 1794 of file CudaDnn.cs.

◆ TotalMemoryUsedAsText

string MyCaffe.common.CudaDnn< T >.TotalMemoryUsedAsText

get

Returns the total amount of memory used.

Definition at line 1802 of file CudaDnn.cs.

The documentation for this class was generated from the following file:

C:/Data/Data/SS_Projects/Intelligence/GitHub/MyCaffe/MyCaffe/common/CudaDnn.cs

Public Member Functions

Static Public Member Functions

Protected Member Functions

Properties

Detailed Description

Constructor & Destructor Documentation

◆ CudaDnn() [1/2]

◆ CudaDnn() [2/2]

Member Function Documentation

◆ abs()

◆ accuracy_fwd()

◆ adadelta_update()

◆ adagrad_update()

◆ adam_update()

◆ adamw_update()

◆ add() [1/5]

◆ add() [2/5]

◆ add() [3/5]

◆ add() [4/5]

◆ add() [5/5]

◆ add_scalar() [1/3]

◆ add_scalar() [2/3]

◆ add_scalar() [3/3]

◆ AddTensor() [1/2]

◆ AddTensor() [2/2]

◆ AllocHostBuffer()

◆ AllocMemory() [1/6]

◆ AllocMemory() [2/6]

◆ AllocMemory() [3/6]

◆ AllocMemory() [4/6]

◆ AllocMemory() [5/6]

◆ AllocMemory() [6/6]

◆ AllocPCAData()

◆ AllocPCAEigenvalues()

◆ AllocPCALoads()

◆ AllocPCAScores()

◆ asum()

◆ asum_double()

◆ asum_float()

◆ axpby() [1/3]

◆ axpby() [2/3]

◆ axpby() [3/3]

◆ axpy() [1/3]

◆ axpy() [2/3]

◆ axpy() [3/3]

◆ basetype_size()

◆ BatchNormBackward()

◆ BatchNormForward()

◆ batchreidx_bwd()

◆ batchreidx_fwd()

◆ bias_fwd()

◆ bnll_bwd()

◆ bnll_fwd()

◆ calc_dft_coefficients()

◆ calculate_batch_distances()

◆ channel_add()

◆ channel_compare()

◆ channel_copy()

◆ channel_copyall()

◆ channel_div()

◆ channel_dot()

◆ channel_duplicate()

◆ channel_fill()

◆ channel_fillfrom()

◆ channel_max()

◆ channel_mean()

◆ channel_min()

◆ channel_mul()

◆ channel_mulv()

◆ channel_op_bwd()

◆ channel_op_fwd()

◆ channel_percentile()

◆ channel_scale()

◆ channel_sub() [1/2]

◆ channel_sub() [2/2]

◆ channel_sum()

◆ CheckMemoryAttributes()

◆ clip_bwd()

◆ clip_fwd()

◆ cll_bwd()