tft.apply_vocabulary

Maps x to a vocabulary specified by the deferred tensor.

This function also writes domain statistics about the vocabulary min and max values. Note that the min and max are inclusive, and depend on the vocab size, num_oov_buckets and default_value.

x A categorical Tensor, SparseTensor, or RaggedTensor of type tf.string or tf.int[8|16|32|64] to which the vocabulary transformation should be applied. The column names are those intended for the transformed tensors.
deferred_vocab_filename_tensor The deferred vocab filename tensor as returned by tft.vocabulary, as long as the frequencies were not stored.
default_value The value to use for out-of-vocabulary values, unless 'num_oov_buckets' is greater than zero.
num_oov_buckets Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if num_oov_buckets is greater than zero. Otherwise it is assigned the default_value.
lookup_fn Optional lookup function, if specified it should take a tensor and a deferred vocab filename as an input and return a lookup op along with the table size, by default apply_vocabulary constructs a StaticHashTable for the table lookup.
file_format (Optional) A str. The format of the given vocabulary. Accepted formats are: 'tfrecord_gzip', 'text'. The default value is 'text'.
name (Optional) A name for this operation.

A Tensor, SparseTensor, or RaggedTensor where each string value is mapped to an integer. Each unique string value that appears in the vocabulary is mapped to a different integer and integers are consecutive starting from zero, and string value not in the vocabulary is assigned default_value.