MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.layers.gpt.VocabularyWord Class Reference

The VocabularyWords class manages the data vocabulary of words. More...

Inheritance diagram for MyCaffe.layers.gpt.VocabularyWord:
MyCaffe.layers.gpt.IVocabulary

Public Member Functions

 VocabularyWord (Random random, bool bAddBos, bool bAddEos)
 The constructor. More...
 
void Add (string str)
 Adds a new character to the vocabulary. More...
 
int Build ()
 Builds the vocabulary from all words added. More...
 
int BuildFromString (string strData)
 Build the vocabulary from a string. More...
 
int[] CreateTarget (int[] rgSrc)
 Create a target that is offset from the source by one and ends with a EOS. More...
 
List< int > Tokenize (string strWord, bool bMustExist=true)
 Tokenize a character into its corresponding index token. More...
 
int[] Tokenize (string str, bool bAddBos, bool bAddEos)
 Tokenize a string of data. More...
 
string Detokenize (int nIdxToken, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an index token into its corresponding character. More...
 
string Detokenize (float[] rgf, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an array into a string. More...
 

Properties

int Count [get]
 Returns the size of the vocabulary. More...
 
char BOS [get]
 Returns the special BOS character. More...
 
char EOS [get]
 Returns the special EOS character. More...
 
- Properties inherited from MyCaffe.layers.gpt.IVocabulary
int Count [get]
 Returns the size of the vocabulary. More...
 
char BOS [get]
 Returns the special BOS character. More...
 
char EOS [get]
 Returns the special EOS character. More...
 

Detailed Description

The VocabularyWords class manages the data vocabulary of words.

Definition at line 13 of file VocabularyWord.cs.

Constructor & Destructor Documentation

◆ VocabularyWord()

MyCaffe.layers.gpt.VocabularyWord.VocabularyWord ( Random  random,
bool  bAddBos,
bool  bAddEos 
)

The constructor.

Parameters
randomSpecifies the random number generator used.
bAddBosSpecifies to include the special BOS character in the vocabulary.
bAddEosSpecifies to include the special EOS character in the vocabulary.

Definition at line 27 of file VocabularyWord.cs.

Member Function Documentation

◆ Add()

void MyCaffe.layers.gpt.VocabularyWord.Add ( string  str)

Adds a new character to the vocabulary.

Parameters
strSpecifies the sentence or word to add.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 87 of file VocabularyWord.cs.

◆ Build()

int MyCaffe.layers.gpt.VocabularyWord.Build ( )

Builds the vocabulary from all words added.

Returns
The vocabulary size is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 135 of file VocabularyWord.cs.

◆ BuildFromString()

int MyCaffe.layers.gpt.VocabularyWord.BuildFromString ( string  strData)

Build the vocabulary from a string.

Parameters
strDataSpecifies the data to build the vocabulary from.
Returns
The vocabulary size is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 157 of file VocabularyWord.cs.

◆ CreateTarget()

int[] MyCaffe.layers.gpt.VocabularyWord.CreateTarget ( int[]  rgSrc)

Create a target that is offset from the source by one and ends with a EOS.

Parameters
rgSrcSpecifies the source to create the target from.
Returns
The tokenized target is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 189 of file VocabularyWord.cs.

◆ Detokenize() [1/2]

string MyCaffe.layers.gpt.VocabularyWord.Detokenize ( float[]  rgf,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an array into a string.

Parameters
rgfSpecifies the array of tokens to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 320 of file VocabularyWord.cs.

◆ Detokenize() [2/2]

string MyCaffe.layers.gpt.VocabularyWord.Detokenize ( int  nIdxToken,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an index token into its corresponding character.

Parameters
nIdxTokenSpecifies the token to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned (which may just be a character).

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 281 of file VocabularyWord.cs.

◆ Tokenize() [1/2]

int[] MyCaffe.layers.gpt.VocabularyWord.Tokenize ( string  str,
bool  bAddBos,
bool  bAddEos 
)

Tokenize a string of data.

Parameters
strSpecifies the string to tokenize.
bAddBosSpecifies to add the BOS at the start of the tokenized data.
bAddEosSpecifies to add the EOS to the end of the tokenized data.
Returns
The array of tokens is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 255 of file VocabularyWord.cs.

◆ Tokenize() [2/2]

List< int > MyCaffe.layers.gpt.VocabularyWord.Tokenize ( string  strWord,
bool  bMustExist = true 
)

Tokenize a character into its corresponding index token.

Parameters
strWordSpecifies a single word to tokenize.
bMustExistOptionally, specifies to throw an error if the item is not in the vocabulary (default = true).
Returns
A list of tokens corresponding to the input is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 205 of file VocabularyWord.cs.

Property Documentation

◆ BOS

char MyCaffe.layers.gpt.VocabularyWord.BOS
get

Returns the special BOS character.

Definition at line 171 of file VocabularyWord.cs.

◆ Count

int MyCaffe.layers.gpt.VocabularyWord.Count
get

Returns the size of the vocabulary.

Definition at line 43 of file VocabularyWord.cs.

◆ EOS

char MyCaffe.layers.gpt.VocabularyWord.EOS
get

Returns the special EOS character.

Definition at line 179 of file VocabularyWord.cs.


The documentation for this class was generated from the following file: