tf.keras.layers.experimental.preprocessing.IntegerLookup

Maps integers from a vocabulary to integer indices.

Used in the notebooks

Used in the guide Used in the tutorials

This layer translates a set of arbitrary integers into an integer output via a table-based lookup, with optional out-of-vocabulary handling.

If desired, the user can call this layer's adapt() method on a data set, which will analyze the data set, determine the frequency of individual string values, and create a vocabulary from them. This vocabulary can have unlimited size or be capped, depending on the configuration options for this layer; if there are more unique values in the input than the maximum vocabulary size, the most frequent terms will be used to create the vocabulary.

Examples:

Creating a lookup layer with a known vocabulary

This example creates a lookup layer with a pre-existing vocabulary.

vocab = [12, 36, 1138, 42]
data = tf.constant([[12, 1138, 42], [42, 1000, 36]])
layer = IntegerLookup(vocabulary=vocab)
layer(data)
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[2, 4, 5],
       [5, 1, 3]])>

Creating a lookup layer with an adapted vocabulary

This example creates a lookup layer and generates the vocabulary by analyzing the dataset.

data = tf.constant([[12, 1138, 42], [42, 1000, 36]])
layer = IntegerLookup()
layer.adapt(data)
layer.get_vocabulary()
[0, -1, 42, 1138, 1000, 36, 12]

Note how the mask value 0 and the OOV value -1 have been added to the vocabulary. The remaining values are sorted by frequency (1138, which has 2 occurrences, is first) then by inverse sort order.

data = tf.constant([[12, 1138, 42], [42, 1000, 36]])
layer = IntegerLookup()
layer.adapt(data)
layer(data)
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=
array([[6, 3, 2],
       [2, 4, 5]])>

Lookups with multiple OOV tokens.

This example demonstrates how to use a lookup layer with multiple OOV tokens. When a layer is created with more than one OOV token, any OOV values are hashed into the num