One-hot encodes a text into a list of word indexes of size n
.
View aliases
Compat aliases for migration
See Migration guide for more details.
tf.keras.preprocessing.text.one_hot(
input_text, n,
filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
lower=True, split=' '
)
This function receives as input a string of text and returns a list of encoded integers each corresponding to a word (or token) in the given input string.
Args | |
---|---|
input_text
|
Input text (string). |
n
|
int. Size of vocabulary. |
filters
|
list (or concatenation) of characters to filter out, such as
punctuation. Default:
'!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n ,
includes basic punctuation, tabs, and newlines.
|
lower
|
boolean. Whether to set the text to lowercase. |
split
|
str. Separator for word splitting. |
Returns | |
---|---|
List of integers in [1, n] . Each integer encodes a word
(unicity non-guaranteed).
|