MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.layers.gpt.InputData Class Referenceabstract

The InputData is an abstract class used to get training data and tokenize input data. More...

Inheritance diagram for MyCaffe.layers.gpt.InputData:
MyCaffe.layers.gpt.CustomListData MyCaffe.layers.gpt.TextInputData MyCaffe.layers.gpt.TextListData

Public Member Functions

 InputData (int? nRandomSeed=null)
 The constructor. More...
 
abstract bool GetDataAvailabilityAt (int nIdx, bool bIncludeSrc, bool bIncludeTrg)
 Returns true if data is available at the given index. More...
 
abstract Tuple< float[], float[]> GetData (int nBatchSize, int nBlockSize, InputData trgData, out int[] rgnIdx)
 Gets a set of randomly selected source/target data, where the target may be null. More...
 
abstract Tuple< float[], float[]> GetDataAt (int nBatchSize, int nBlockSize, int[] rgnIdx)
 Gets a set of source/target data from a specific index. More...
 
abstract List< int > Tokenize (string str, bool bAddBos, bool bAddEos)
 Tokenize an input string using the internal vocabulary. More...
 
abstract string Detokenize (int nTokIdx, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize a single token. More...
 
abstract string Detokenize (float[] rgf, int nStartIdx, int nCount, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an array into a string. More...
 

Protected Attributes

Random m_random
 Specifies the random object made available to the derived classes. More...
 

Properties

abstract List< string > RawData [get]
 Returns the raw data. More...
 
abstract uint TokenSize [get]
 Returns the size of a single token (e.g. 1 for character data) More...
 
abstract uint VocabularySize [get]
 Returns the size of the vocabulary. More...
 
abstract char BOS [get]
 Return the special begin of sequence character. More...
 
abstract char EOS [get]
 Return the special end of sequence character. More...
 

Detailed Description

The InputData is an abstract class used to get training data and tokenize input data.

Definition at line 112 of file Interfaces.cs.

Constructor & Destructor Documentation

◆ InputData()

MyCaffe.layers.gpt.InputData.InputData ( int?  nRandomSeed = null)

The constructor.

Parameters
nRandomSeedOptionally, specifies the seed to use for testing.

Definition at line 123 of file Interfaces.cs.

Member Function Documentation

◆ Detokenize() [1/2]

abstract string MyCaffe.layers.gpt.InputData.Detokenize ( float[]  rgf,
int  nStartIdx,
int  nCount,
bool  bIgnoreBos,
bool  bIgnoreEos 
)
pure virtual

Detokenize an array into a string.

Parameters
rgfSpecifies the array of tokens to detokenize.
nStartIdxSpecifies the starting index where detokenizing begins.
nCountSpecifies the number of tokens to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

◆ Detokenize() [2/2]

abstract string MyCaffe.layers.gpt.InputData.Detokenize ( int  nTokIdx,
bool  bIgnoreBos,
bool  bIgnoreEos 
)
pure virtual

Detokenize a single token.

Parameters
nTokIdxSpecifies an index to the token to be detokenized.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized item is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

◆ GetData()

abstract Tuple< float[], float[]> MyCaffe.layers.gpt.InputData.GetData ( int  nBatchSize,
int  nBlockSize,
InputData  trgData,
out int[]  rgnIdx 
)
pure virtual

Gets a set of randomly selected source/target data, where the target may be null.

Parameters
nBatchSizeSpecifies the number of blocks in the batch.
nBlockSizeSpecifies the size of each block.
trgDataSpecifies the target data used to see if data at index has data.
rgnIdxReturns an array of the indexes of the data returned.
Returns
A tuple containing the data and target is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

◆ GetDataAt()

abstract Tuple< float[], float[]> MyCaffe.layers.gpt.InputData.GetDataAt ( int  nBatchSize,
int  nBlockSize,
int[]  rgnIdx 
)
pure virtual

Gets a set of source/target data from a specific index.

Parameters
nBatchSizeSpecifies the number of blocks in the batch.
nBlockSizeSpecifies the size of each block.
rgnIdxSpecifies the array of indexes of data to retrieve.
Returns
A tuple containing the data and target is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

◆ GetDataAvailabilityAt()

abstract bool MyCaffe.layers.gpt.InputData.GetDataAvailabilityAt ( int  nIdx,
bool  bIncludeSrc,
bool  bIncludeTrg 
)
pure virtual

Returns true if data is available at the given index.

Parameters
nIdxSpecifies the index to check
bIncludeSrcSpecifies to include the source in the check.
bIncludeTrgSpecifies to include the target in the check.
Returns
If the data is available, true is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

◆ Tokenize()

abstract List< int > MyCaffe.layers.gpt.InputData.Tokenize ( string  str,
bool  bAddBos,
bool  bAddEos 
)
pure virtual

Tokenize an input string using the internal vocabulary.

Parameters
strSpecifies the string to tokenize.
bAddBosAdd the begin of sequence token.
bAddEosAdd the end of sequence token.
Returns
A list of tokens corresponding to the input is returned.

Implemented in MyCaffe.layers.gpt.TextInputData, MyCaffe.layers.gpt.TextListData, and MyCaffe.layers.gpt.CustomListData.

Member Data Documentation

◆ m_random

Random MyCaffe.layers.gpt.InputData.m_random
protected

Specifies the random object made available to the derived classes.

Definition at line 117 of file Interfaces.cs.

Property Documentation

◆ BOS

abstract char MyCaffe.layers.gpt.InputData.BOS
get

Return the special begin of sequence character.

Definition at line 197 of file Interfaces.cs.

◆ EOS

abstract char MyCaffe.layers.gpt.InputData.EOS
get

Return the special end of sequence character.

Definition at line 201 of file Interfaces.cs.

◆ RawData

abstract List<string> MyCaffe.layers.gpt.InputData.RawData
get

Returns the raw data.

Definition at line 134 of file Interfaces.cs.

◆ TokenSize

abstract uint MyCaffe.layers.gpt.InputData.TokenSize
get

Returns the size of a single token (e.g. 1 for character data)

Definition at line 138 of file Interfaces.cs.

◆ VocabularySize

abstract uint MyCaffe.layers.gpt.InputData.VocabularySize
get

Returns the size of the vocabulary.

Definition at line 142 of file Interfaces.cs.


The documentation for this class was generated from the following file: