MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.param.gpt.MultiheadAttentionParameter Class Reference

Specifies the parameters for the MultiheadAttentionLayer. More...

Inheritance diagram for MyCaffe.param.gpt.MultiheadAttentionParameter:
MyCaffe.param.LayerParameterBase MyCaffe.basecode.BaseParameter MyCaffe.basecode.IBinaryPersist

Public Types

enum  WEIGHT_INIT { GPT , ENCODER_DECODER }
 Defines the weight initialization strategy. More...
 
- Public Types inherited from MyCaffe.param.LayerParameterBase
enum  LABEL_TYPE { NONE , SINGLE , MULTIPLE , ONLY_ONE }
 Defines the label type. More...
 

Public Member Functions

 MultiheadAttentionParameter ()
 Constructor for the parameter. More...
 
override object Load (System.IO.BinaryReader br, bool bNewInstance=true)
 Load the parameter from a binary reader. More...
 
override void Copy (LayerParameterBase src)
 Copy on parameter to another. More...
 
override LayerParameterBase Clone ()
 Creates a new copy of this instance of the parameter. More...
 
override RawProto ToProto (string strName)
 Convert the parameter into a RawProto. More...
 
- Public Member Functions inherited from MyCaffe.param.LayerParameterBase
 LayerParameterBase ()
 Constructor for the parameter. More...
 
virtual string PrepareRunModelInputs ()
 This method gives derivative classes a chance specify model inputs required by the run model. More...
 
virtual void PrepareRunModel (LayerParameter p)
 This method gives derivative classes a chance to prepare the layer for a run-model. More...
 
void Save (BinaryWriter bw)
 Save this parameter to a binary writer. More...
 
abstract object Load (BinaryReader br, bool bNewInstance=true)
 Load the parameter from a binary reader. More...
 
- Public Member Functions inherited from MyCaffe.basecode.BaseParameter
 BaseParameter ()
 Constructor for the parameter. More...
 
virtual bool Compare (BaseParameter p)
 Compare this parameter to another parameter. More...
 

Static Public Member Functions

static MultiheadAttentionParameter FromProto (RawProto rp)
 Parses the parameter from a RawProto. More...
 
- Static Public Member Functions inherited from MyCaffe.basecode.BaseParameter
static double ParseDouble (string strVal)
 Parse double values using the US culture if the decimal separator = '.', then using the native culture, and if then lastly trying the US culture to handle prototypes containing '.' as the separator, yet parsed in a culture that does not use '.' as a decimal. More...
 
static bool TryParse (string strVal, out double df)
 Parse double values using the US culture if the decimal separator = '.', then using the native culture, and if then lastly trying the US culture to handle prototypes containing '.' as the separator, yet parsed in a culture that does not use '.' as a decimal. More...
 
static float ParseFloat (string strVal)
 Parse float values using the US culture if the decimal separator = '.', then using the native culture, and if then lastly trying the US culture to handle prototypes containing '.' as the separator, yet parsed in a culture that does not use '.' as a decimal. More...
 
static bool TryParse (string strVal, out float f)
 Parse doufloatble values using the US culture if the decimal separator = '.', then using the native culture, and if then lastly trying the US culture to handle prototypes containing '.' as the separator, yet parsed in a culture that does not use '.' as a decimal. More...
 

Properties

uint layers [getset]
 The number of layers (transformer blocks) used. More...
 
uint heads [getset]
 The number of heads used. More...
 
uint embed [getset]
 Specifies size of the embed. More...
 
uint block_size [getset]
 Specifies size of the block. More...
 
double attn_dropout [getset]
 Specifies dropout probability used on the attention weights. More...
 
double resid_dropout [getset]
 Specifies dropout probability used on the residual weights. More...
 
WEIGHT_INIT weight_init [getset]
 Specifies the weight initialization strategy (default = ENCODER_DECODER). More...
 

Detailed Description

Specifies the parameters for the MultiheadAttentionLayer.

Definition at line 15 of file MultiheadAttentionParameter.cs.

Member Enumeration Documentation

◆ WEIGHT_INIT

Defines the weight initialization strategy.

Enumerator
GPT 

Specifies to use the GPT style weight strategy.

ENCODER_DECODER 

Specifies to use the XAVIER initialization on both weight and bias.

Definition at line 28 of file MultiheadAttentionParameter.cs.

Constructor & Destructor Documentation

◆ MultiheadAttentionParameter()

MyCaffe.param.gpt.MultiheadAttentionParameter.MultiheadAttentionParameter ( )

Constructor for the parameter.

Definition at line 41 of file MultiheadAttentionParameter.cs.

Member Function Documentation

◆ Clone()

override LayerParameterBase MyCaffe.param.gpt.MultiheadAttentionParameter.Clone ( )
virtual

Creates a new copy of this instance of the parameter.

Returns
A new instance of this parameter is returned.

Implements MyCaffe.param.LayerParameterBase.

Definition at line 137 of file MultiheadAttentionParameter.cs.

◆ Copy()

override void MyCaffe.param.gpt.MultiheadAttentionParameter.Copy ( LayerParameterBase  src)
virtual

Copy on parameter to another.

Parameters
srcSpecifies the parameter to copy.

Implements MyCaffe.param.LayerParameterBase.

Definition at line 123 of file MultiheadAttentionParameter.cs.

◆ FromProto()

static MultiheadAttentionParameter MyCaffe.param.gpt.MultiheadAttentionParameter.FromProto ( RawProto  rp)
static

Parses the parameter from a RawProto.

Parameters
rpSpecifies the RawProto to parse.
Returns
A new instance of the parameter is returned.

Definition at line 169 of file MultiheadAttentionParameter.cs.

◆ Load()

override object MyCaffe.param.gpt.MultiheadAttentionParameter.Load ( System.IO.BinaryReader  br,
bool  bNewInstance = true 
)

Load the parameter from a binary reader.

Parameters
brSpecifies the binary reader.
bNewInstanceWhen true a new instance is created (the default), otherwise the existing instance is loaded from the binary reader.
Returns
Returns an instance of the parameter.

Definition at line 111 of file MultiheadAttentionParameter.cs.

◆ ToProto()

override RawProto MyCaffe.param.gpt.MultiheadAttentionParameter.ToProto ( string  strName)
virtual

Convert the parameter into a RawProto.

Parameters
strNameSpecifies the name to associate with the RawProto.
Returns
The new RawProto is returned.

Implements MyCaffe.basecode.BaseParameter.

Definition at line 149 of file MultiheadAttentionParameter.cs.

Property Documentation

◆ attn_dropout

double MyCaffe.param.gpt.MultiheadAttentionParameter.attn_dropout
getset

Specifies dropout probability used on the attention weights.

Definition at line 86 of file MultiheadAttentionParameter.cs.

◆ block_size

uint MyCaffe.param.gpt.MultiheadAttentionParameter.block_size
getset

Specifies size of the block.

Definition at line 77 of file MultiheadAttentionParameter.cs.

◆ embed

uint MyCaffe.param.gpt.MultiheadAttentionParameter.embed
getset

Specifies size of the embed.

Definition at line 68 of file MultiheadAttentionParameter.cs.

◆ heads

uint MyCaffe.param.gpt.MultiheadAttentionParameter.heads
getset

The number of heads used.

Definition at line 59 of file MultiheadAttentionParameter.cs.

◆ layers

uint MyCaffe.param.gpt.MultiheadAttentionParameter.layers
getset

The number of layers (transformer blocks) used.

Definition at line 49 of file MultiheadAttentionParameter.cs.

◆ resid_dropout

double MyCaffe.param.gpt.MultiheadAttentionParameter.resid_dropout
getset

Specifies dropout probability used on the residual weights.

Definition at line 95 of file MultiheadAttentionParameter.cs.

◆ weight_init

WEIGHT_INIT MyCaffe.param.gpt.MultiheadAttentionParameter.weight_init
getset

Specifies the weight initialization strategy (default = ENCODER_DECODER).

Definition at line 104 of file MultiheadAttentionParameter.cs.


The documentation for this class was generated from the following file: