MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.layers.gpt.CustomListData Class Reference

The CustomData supports external data input via an external Assembly DLL that supports the ICustomTokenInput interface. More...

Inheritance diagram for MyCaffe.layers.gpt.CustomListData:
MyCaffe.layers.gpt.InputData

Public Member Functions

 CustomListData (CancelEvent evtCancel, Log log, string strCustomDllFile, string strVocabInfo, int nBlockSizeSrc, int? nRandomSeed=null, Phase phase=Phase.NONE)
 The constructor. More...
 
override bool GetDataAvailabilityAt (int nIdx, bool bIncludeSrc, bool bIncludeTrg)
 Returns true if data is available at the given index. More...
 
override Tuple< float[], float[]> GetData (int nBatchSize, int nBlockSize, InputData trgData, out int[] rgnIdx)
 Retrieve random blocks from the source data where the data and target are the same but offset by one element where the target is offset +1 from the data. More...
 
override Tuple< float[], float[]> GetDataAt (int nBatchSize, int nBlockSize, int[] rgnIdx)
 Fill a batch of data from a specified array of indexes. More...
 
override List< int > Tokenize (string str, bool bAddBos, bool bAddEos)
 Tokenize an input string using the internal vocabulary. More...
 
override string Detokenize (float[] rgfTokIdx, int nStartIdx, int nCount, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an array into a string. More...
 
override string Detokenize (int nTokIdx, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize a single token. More...
 
- Public Member Functions inherited from MyCaffe.layers.gpt.InputData
 InputData (int? nRandomSeed=null)
 The constructor. More...
 

Properties

override List< string > RawData [get]
 Returns the raw data. More...
 
override uint TokenSize [get]
 Returns the token size. More...
 
override uint VocabularySize [get]
 Returns the vocabulary size. More...
 
override char BOS [get]
 Return the special begin of sequence character. More...
 
override char EOS [get]
 Return the special end of sequence character. More...
 
- Properties inherited from MyCaffe.layers.gpt.InputData
abstract List< string > RawData [get]
 Returns the raw data. More...
 
abstract uint TokenSize [get]
 Returns the size of a single token (e.g. 1 for character data) More...
 
abstract uint VocabularySize [get]
 Returns the size of the vocabulary. More...
 
abstract char BOS [get]
 Return the special begin of sequence character. More...
 
abstract char EOS [get]
 Return the special end of sequence character. More...
 

Additional Inherited Members

- Protected Attributes inherited from MyCaffe.layers.gpt.InputData
Random m_random
 Specifies the random object made available to the derived classes. More...
 

Detailed Description

The CustomData supports external data input via an external Assembly DLL that supports the ICustomTokenInput interface.

Definition at line 937 of file TokenizedDataPairsLayer.cs.

Constructor & Destructor Documentation

◆ CustomListData()

MyCaffe.layers.gpt.CustomListData.CustomListData ( CancelEvent  evtCancel,
Log  log,
string  strCustomDllFile,
string  strVocabInfo,
int  nBlockSizeSrc,
int?  nRandomSeed = null,
Phase  phase = Phase.NONE 
)

The constructor.

Parameters
evtCancelSpecifies the cancel event.
logSpecifies the output log.
strCustomDllFileSpecifies the path to the custom assembly DLL.
strVocabInfoSpecifies the vocab info and shoudl be set to "ENC" or "DEC"
nBlockSizeSrcSpecifies the block size.
nRandomSeedSpecifies a random see.d
phaseSpecifies the running phase.
Exceptions
ExceptionAn exception is thrown on error.

Note the source and target token sets must have matching DateTime[] arrays.

Definition at line 963 of file TokenizedDataPairsLayer.cs.

Member Function Documentation

◆ Detokenize() [1/2]

override string MyCaffe.layers.gpt.CustomListData.Detokenize ( float[]  rgfTokIdx,
int  nStartIdx,
int  nCount,
bool  bIgnoreBos,
bool  bIgnoreEos 
)
virtual

Detokenize an array into a string.

Parameters
rgfTokIdxSpecifies the array of tokens to detokenize.
nStartIdxSpecifies the starting index where detokenizing begins.
nCountSpecifies the number of tokens to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1200 of file TokenizedDataPairsLayer.cs.

◆ Detokenize() [2/2]

override string MyCaffe.layers.gpt.CustomListData.Detokenize ( int  nTokIdx,
bool  bIgnoreBos,
bool  bIgnoreEos 
)
virtual

Detokenize a single token.

Parameters
nTokIdxSpecifies an index to the token to be detokenized.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized character is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1212 of file TokenizedDataPairsLayer.cs.

◆ GetData()

override Tuple< float[], float[]> MyCaffe.layers.gpt.CustomListData.GetData ( int  nBatchSize,
int  nBlockSize,
InputData  trgData,
out int[]  rgnIdx 
)
virtual

Retrieve random blocks from the source data where the data and target are the same but offset by one element where the target is offset +1 from the data.

Parameters
nBatchSizeSpecifies the batch size.
nBlockSizeSpecifies teh block size.
trgDataSpecifies the matching target data used to verify that both source and target have data at each chosen index.
rgnIdxReturns an array of the indexes of the data returned.
Returns
A tuple containing the data and target is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1064 of file TokenizedDataPairsLayer.cs.

◆ GetDataAt()

override Tuple< float[], float[]> MyCaffe.layers.gpt.CustomListData.GetDataAt ( int  nBatchSize,
int  nBlockSize,
int[]  rgnIdx 
)
virtual

Fill a batch of data from a specified array of indexes.

Parameters
nBatchSizeSpecifies the number of blocks in the batch.
nBlockSizeSpecifies the size of each block.
rgnIdxSpecifies the array of indexes to the data to be retrieved.
Returns
A tuple containing the data and target is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1133 of file TokenizedDataPairsLayer.cs.

◆ GetDataAvailabilityAt()

override bool MyCaffe.layers.gpt.CustomListData.GetDataAvailabilityAt ( int  nIdx,
bool  bIncludeSrc,
bool  bIncludeTrg 
)
virtual

Returns true if data is available at the given index.

Parameters
nIdxSpecifies the index to check
bIncludeSrcSpecifies to include the source in the check.
bIncludeTrgSpecifies to include the target in the check.
Returns
If the data is available, true is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1044 of file TokenizedDataPairsLayer.cs.

◆ Tokenize()

override List< int > MyCaffe.layers.gpt.CustomListData.Tokenize ( string  str,
bool  bAddBos,
bool  bAddEos 
)
virtual

Tokenize an input string using the internal vocabulary.

Parameters
strSpecifies the string to tokenize.
bAddBosAdd the begin of sequence token.
bAddEosAdd the end of sequence token.
Returns
A list of tokens corresponding to the input is returned.

Implements MyCaffe.layers.gpt.InputData.

Definition at line 1186 of file TokenizedDataPairsLayer.cs.

Property Documentation

◆ BOS

override char MyCaffe.layers.gpt.CustomListData.BOS
get

Return the special begin of sequence character.

Definition at line 1220 of file TokenizedDataPairsLayer.cs.

◆ EOS

override char MyCaffe.layers.gpt.CustomListData.EOS
get

Return the special end of sequence character.

Definition at line 1228 of file TokenizedDataPairsLayer.cs.

◆ RawData

override List<string> MyCaffe.layers.gpt.CustomListData.RawData
get

Returns the raw data.

Definition at line 1016 of file TokenizedDataPairsLayer.cs.

◆ TokenSize

override uint MyCaffe.layers.gpt.CustomListData.TokenSize
get

Returns the token size.

Definition at line 1024 of file TokenizedDataPairsLayer.cs.

◆ VocabularySize

override uint MyCaffe.layers.gpt.CustomListData.VocabularySize
get

Returns the vocabulary size.

Definition at line 1032 of file TokenizedDataPairsLayer.cs.


The documentation for this class was generated from the following file: