MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.layers.gpt.VocabularySentencePiece Class Reference

The VocabularySentencePieces class manages the data vocabulary of words. More...

Inheritance diagram for MyCaffe.layers.gpt.VocabularySentencePiece:
MyCaffe.layers.gpt.IVocabulary

Public Member Functions

 VocabularySentencePiece (Random random, bool bAddBos, bool bAddEos, string strVocabFile)
 The constructor. More...
 
void Add (string str)
 Adds a new character to the vocabulary. More...
 
int Build ()
 Builds the vocabulary from all words added. More...
 
int BuildFromString (string strData)
 Build the vocabulary from a string. More...
 
int[] CreateTarget (int[] rgSrc)
 Create a target that is offset from the source by one and ends with a EOS. More...
 
List< int > Tokenize (string strWord, bool bMustExist=true)
 Tokenize a character into its corresponding index token. More...
 
int[] Tokenize (string str, bool bAddBos, bool bAddEos)
 Tokenize a string of data. More...
 
string Detokenize (int nIdxToken, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an index token into its corresponding character. More...
 
string Detokenize (float[] rgf, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an array into a string. More...
 

Properties

int Count [get]
 Returns the size of the vocabulary. More...
 
char BOS [get]
 Returns the special BOS character. More...
 
char EOS [get]
 Returns the special EOS character. More...
 
- Properties inherited from MyCaffe.layers.gpt.IVocabulary
int Count [get]
 Returns the size of the vocabulary. More...
 
char BOS [get]
 Returns the special BOS character. More...
 
char EOS [get]
 Returns the special EOS character. More...
 

Detailed Description

The VocabularySentencePieces class manages the data vocabulary of words.

Definition at line 16 of file VocabularySentencePiece.cs.

Constructor & Destructor Documentation

◆ VocabularySentencePiece()

MyCaffe.layers.gpt.VocabularySentencePiece.VocabularySentencePiece ( Random  random,
bool  bAddBos,
bool  bAddEos,
string  strVocabFile 
)

The constructor.

Parameters
randomSpecifies the random number generator used.
bAddBosSpecifies to include the special BOS character in the vocabulary.
bAddEosSpecifies to include the special EOS character in the vocabulary.
strVocabFileSpecifies the vocabulary file created using the Python, SentencePieceProcess.

Definition at line 32 of file VocabularySentencePiece.cs.

Member Function Documentation

◆ Add()

void MyCaffe.layers.gpt.VocabularySentencePiece.Add ( string  str)

Adds a new character to the vocabulary.

Parameters
strSpecifies the sentence or word to add.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 107 of file VocabularySentencePiece.cs.

◆ Build()

int MyCaffe.layers.gpt.VocabularySentencePiece.Build ( )

Builds the vocabulary from all words added.

Returns
The vocabulary size is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 139 of file VocabularySentencePiece.cs.

◆ BuildFromString()

int MyCaffe.layers.gpt.VocabularySentencePiece.BuildFromString ( string  strData)

Build the vocabulary from a string.

Parameters
strDataSpecifies the data to build the vocabulary from.
Returns
The vocabulary size is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 170 of file VocabularySentencePiece.cs.

◆ CreateTarget()

int[] MyCaffe.layers.gpt.VocabularySentencePiece.CreateTarget ( int[]  rgSrc)

Create a target that is offset from the source by one and ends with a EOS.

Parameters
rgSrcSpecifies the source to create the target from.
Returns
The tokenized target is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 202 of file VocabularySentencePiece.cs.

◆ Detokenize() [1/2]

string MyCaffe.layers.gpt.VocabularySentencePiece.Detokenize ( float[]  rgf,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an array into a string.

Parameters
rgfSpecifies the array of tokens to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 325 of file VocabularySentencePiece.cs.

◆ Detokenize() [2/2]

string MyCaffe.layers.gpt.VocabularySentencePiece.Detokenize ( int  nIdxToken,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an index token into its corresponding character.

Parameters
nIdxTokenSpecifies the token to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned (which may just be a character).

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 286 of file VocabularySentencePiece.cs.

◆ Tokenize() [1/2]

int[] MyCaffe.layers.gpt.VocabularySentencePiece.Tokenize ( string  str,
bool  bAddBos,
bool  bAddEos 
)

Tokenize a string of data.

Parameters
strSpecifies the string to tokenize.
bAddBosSpecifies to add the BOS at the start of the tokenized data.
bAddEosSpecifies to add the EOS to the end of the tokenized data.
Returns
The array of tokens is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 257 of file VocabularySentencePiece.cs.

◆ Tokenize() [2/2]

List< int > MyCaffe.layers.gpt.VocabularySentencePiece.Tokenize ( string  strWord,
bool  bMustExist = true 
)

Tokenize a character into its corresponding index token.

Parameters
strWordSpecifies a single word to tokenize.
bMustExistOptionally, specifies to throw an error if the item is not in the vocabulary (default = true).
Returns
The token corresponding to the character is returned.

Implements MyCaffe.layers.gpt.IVocabulary.

Definition at line 221 of file VocabularySentencePiece.cs.

Property Documentation

◆ BOS

char MyCaffe.layers.gpt.VocabularySentencePiece.BOS
get

Returns the special BOS character.

Definition at line 184 of file VocabularySentencePiece.cs.

◆ Count

int MyCaffe.layers.gpt.VocabularySentencePiece.Count
get

Returns the size of the vocabulary.

Definition at line 63 of file VocabularySentencePiece.cs.

◆ EOS

char MyCaffe.layers.gpt.VocabularySentencePiece.EOS
get

Returns the special EOS character.

Definition at line 192 of file VocabularySentencePiece.cs.


The documentation for this class was generated from the following file: