MyCaffe
1.12.2.41
Deep learning software for Windows C# programmers.
|
The TextListData manages parallel lists of data where the first list contains the encoder input data and the second the decoder input/target data. More...
Public Types | |
enum | VOCABUARY_TYPE { CHARACTER , WORD } |
Defines the vocabulary time to use. More... | |
Public Member Functions | |
TextListData (Log log, string strSrcFile, string strVocabFile, bool bIncludeTarget, TokenizedDataParameter.VOCABULARY_TYPE vocabType, int? nRandomSeed=null, Phase phase=Phase.NONE) | |
The constructor. More... | |
override bool | GetDataAvailabilityAt (int nIdx, bool bIncludeSrc, bool bIncludeTrg) |
Returns true if data is available at the given index. More... | |
override Tuple< float[], float[]> | GetData (int nBatchSize, int nBlockSize, InputData trgData, out int[] rgnIdx) |
Retrieve random blocks from the source data where the data and target are the same but offset by one element where the target is offset +1 from the data. More... | |
override Tuple< float[], float[]> | GetDataAt (int nBatchSize, int nBlockSize, int[] rgnIdx) |
Fill a batch of data from a specified array of indexes. More... | |
override List< int > | Tokenize (string str, bool bAddBos, bool bAddEos) |
Tokenize an input string using the internal vocabulary. More... | |
override string | Detokenize (float[] rgfTokIdx, int nStartIdx, int nCount, bool bIgnoreBos, bool bIgnoreEos) |
Detokenize an array into a string. More... | |
override string | Detokenize (int nTokIdx, bool bIgnoreBos, bool bIgnoreEos) |
Detokenize a single token. More... | |
Public Member Functions inherited from MyCaffe.layers.gpt.InputData | |
InputData (int? nRandomSeed=null) | |
The constructor. More... | |
Properties | |
override List< string > | RawData [get] |
Return the raw data. More... | |
override uint | TokenSize [get] |
The text data token size is a single character. More... | |
override uint | VocabularySize [get] |
Returns the number of unique characters in the data. More... | |
override char | BOS [get] |
Return the special begin of sequence character. More... | |
override char | EOS [get] |
Return the special end of sequence character. More... | |
Properties inherited from MyCaffe.layers.gpt.InputData | |
abstract List< string > | RawData [get] |
Returns the raw data. More... | |
abstract uint | TokenSize [get] |
Returns the size of a single token (e.g. 1 for character data) More... | |
abstract uint | VocabularySize [get] |
Returns the size of the vocabulary. More... | |
abstract char | BOS [get] |
Return the special begin of sequence character. More... | |
abstract char | EOS [get] |
Return the special end of sequence character. More... | |
Additional Inherited Members | |
Protected Attributes inherited from MyCaffe.layers.gpt.InputData | |
Random | m_random |
Specifies the random object made available to the derived classes. More... | |
The TextListData manages parallel lists of data where the first list contains the encoder input data and the second the decoder input/target data.
Definition at line 608 of file TokenizedDataPairsLayer.cs.
Defines the vocabulary time to use.
Enumerator | |
---|---|
CHARACTER | Specifies a character vocabulary. |
WORD | Specifies a space separated word vocabulary. |
Definition at line 621 of file TokenizedDataPairsLayer.cs.
MyCaffe.layers.gpt.TextListData.TextListData | ( | Log | log, |
string | strSrcFile, | ||
string | strVocabFile, | ||
bool | bIncludeTarget, | ||
TokenizedDataParameter.VOCABULARY_TYPE | vocabType, | ||
int? | nRandomSeed = null , |
||
Phase | phase = Phase.NONE |
||
) |
The constructor.
log | Specifies the output log. |
strSrcFile | Specifies the text file name for the data source. |
strVocabFile | Specifies the vocabulary file (used by SENTENCEPICE type). |
bIncludeTarget | Specifies to create the target tokens. |
vocabType | Specifies the vocabulary type to use. |
nRandomSeed | Optionally, specifies a random seed for testing. |
phase | Specifies the currently running phase. |
Definition at line 643 of file TokenizedDataPairsLayer.cs.
|
virtual |
Detokenize an array into a string.
rgfTokIdx | Specifies the array of tokens to detokenize. |
nStartIdx | Specifies the starting index where detokenizing begins. |
nCount | Specifies the number of tokens to detokenize. |
bIgnoreBos | Specifies to ignore the BOS token. |
bIgnoreEos | Specifies to ignore the EOS token. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 890 of file TokenizedDataPairsLayer.cs.
|
virtual |
Detokenize a single token.
nTokIdx | Specifies an index to the token to be detokenized. |
bIgnoreBos | Specifies to ignore the BOS token. |
bIgnoreEos | Specifies to ignore the EOS token. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 912 of file TokenizedDataPairsLayer.cs.
|
virtual |
Retrieve random blocks from the source data where the data and target are the same but offset by one element where the target is offset +1 from the data.
nBatchSize | Specifies the batch size. |
nBlockSize | Specifies teh block size. |
trgData | Specifies the matching target data used to verify that both source and target have data at each chosen index. |
rgnIdx | Returns an array of the indexes of the data returned. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 754 of file TokenizedDataPairsLayer.cs.
|
virtual |
Fill a batch of data from a specified array of indexes.
nBatchSize | Specifies the number of blocks in the batch. |
nBlockSize | Specifies the size of each block. |
rgnIdx | Specifies the array of indexes to the data to be retrieved. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 823 of file TokenizedDataPairsLayer.cs.
|
virtual |
Returns true if data is available at the given index.
nIdx | Specifies the index to check |
bIncludeSrc | Specifies to include the source in the check. |
bIncludeTrg | Specifies to include the target in the check. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 734 of file TokenizedDataPairsLayer.cs.
|
virtual |
Tokenize an input string using the internal vocabulary.
str | Specifies the string to tokenize. |
bAddBos | Add the begin of sequence token. |
bAddEos | Add the end of sequence token. |
Implements MyCaffe.layers.gpt.InputData.
Definition at line 876 of file TokenizedDataPairsLayer.cs.
|
get |
Return the special begin of sequence character.
Definition at line 920 of file TokenizedDataPairsLayer.cs.
|
get |
Return the special end of sequence character.
Definition at line 928 of file TokenizedDataPairsLayer.cs.
|
get |
Return the raw data.
Definition at line 706 of file TokenizedDataPairsLayer.cs.
|
get |
The text data token size is a single character.
Definition at line 714 of file TokenizedDataPairsLayer.cs.
|
get |
Returns the number of unique characters in the data.
Definition at line 722 of file TokenizedDataPairsLayer.cs.