MyCaffe  1.12.2.41
Deep learning software for Windows C# programmers.
MyCaffe.layers.gpt.IVocabulary Interface Reference

The IVocabulary interface specifies the interface that all Vocabularies implement. More...

Inheritance diagram for MyCaffe.layers.gpt.IVocabulary:
MyCaffe.layers.gpt.VocabularyCharacter MyCaffe.layers.gpt.VocabularySentencePiece MyCaffe.layers.gpt.VocabularyWord

Public Member Functions

void Add (string str)
 Add a new string to the vocabulary. More...
 
int Build ()
 Build the vocabulary. More...
 
int BuildFromString (string strData)
 Build the vocabulary from a string. More...
 
int[] CreateTarget (int[] rgSrc)
 Create a target that is offset from the source by one and ends with a EOS. More...
 
int[] Tokenize (string str, bool bAddBos, bool bAddEos)
 Tokenize a string of data. More...
 
List< int > Tokenize (string str1, bool bMustExist=true)
 Tokenize a character into its corresponding index token. More...
 
string Detokenize (float[] rgf, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an array into a string. More...
 
string Detokenize (int nIdxToken, bool bIgnoreBos, bool bIgnoreEos)
 Detokenize an index token into its corresponding character. More...
 

Properties

int Count [get]
 Returns the size of the vocabulary. More...
 
char BOS [get]
 Returns the special BOS character. More...
 
char EOS [get]
 Returns the special EOS character. More...
 

Detailed Description

The IVocabulary interface specifies the interface that all Vocabularies implement.

Definition at line 13 of file Interfaces.cs.

Member Function Documentation

◆ Add()

void MyCaffe.layers.gpt.IVocabulary.Add ( string  str)

Add a new string to the vocabulary.

Parameters
strSpecifies the string to add.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ Build()

int MyCaffe.layers.gpt.IVocabulary.Build ( )

Build the vocabulary.

Returns
The vocabulary size is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ BuildFromString()

int MyCaffe.layers.gpt.IVocabulary.BuildFromString ( string  strData)

Build the vocabulary from a string.

Parameters
strDataSpecifies the data to build the vocabulary from.
Returns
The vocabulary size is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ CreateTarget()

int[] MyCaffe.layers.gpt.IVocabulary.CreateTarget ( int[]  rgSrc)

Create a target that is offset from the source by one and ends with a EOS.

Parameters
rgSrcSpecifies the source to create the target from.
Returns
The tokenized target is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ Detokenize() [1/2]

string MyCaffe.layers.gpt.IVocabulary.Detokenize ( float[]  rgf,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an array into a string.

Parameters
rgfSpecifies the array of tokens to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ Detokenize() [2/2]

string MyCaffe.layers.gpt.IVocabulary.Detokenize ( int  nIdxToken,
bool  bIgnoreBos,
bool  bIgnoreEos 
)

Detokenize an index token into its corresponding character.

Parameters
nIdxTokenSpecifies the token to detokenize.
bIgnoreBosSpecifies to ignore the BOS token.
bIgnoreEosSpecifies to ignore the EOS token.
Returns
The detokenized string is returned (which may just be a character).

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ Tokenize() [1/2]

int[] MyCaffe.layers.gpt.IVocabulary.Tokenize ( string  str,
bool  bAddBos,
bool  bAddEos 
)

Tokenize a string of data.

Parameters
strSpecifies the string to tokenize.
bAddBosAdd the begin of sequence token.
bAddEosAdd the end of sequence token.
Returns
The array of tokens is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

◆ Tokenize() [2/2]

List< int > MyCaffe.layers.gpt.IVocabulary.Tokenize ( string  str1,
bool  bMustExist = true 
)

Tokenize a character into its corresponding index token.

Parameters
str1Specifies a single element (character or word) to tokenize.
bMustExistOptionally, specifies to throw an error if the item is not in the vocabulary (default = true).
Returns
A list of tokens corresponding to the input is returned.

Implemented in MyCaffe.layers.gpt.VocabularyCharacter, MyCaffe.layers.gpt.VocabularySentencePiece, and MyCaffe.layers.gpt.VocabularyWord.

Property Documentation

◆ BOS

char MyCaffe.layers.gpt.IVocabulary.BOS
get

Returns the special BOS character.

Definition at line 22 of file Interfaces.cs.

◆ Count

int MyCaffe.layers.gpt.IVocabulary.Count
get

Returns the size of the vocabulary.

Definition at line 18 of file Interfaces.cs.

◆ EOS

char MyCaffe.layers.gpt.IVocabulary.EOS
get

Returns the special EOS character.

Definition at line 26 of file Interfaces.cs.


The documentation for this interface was generated from the following file: