tf.keras.layers.CategoryEncoding

View source on GitHub

Category encoding layer.

Inherits From: Layer, Module

View aliases

Main aliases

tf.keras.layers.experimental.preprocessing.CategoryEncoding

Compat aliases for migration

See Migration guide for more details.

tf.compat.v1.keras.layers.CategoryEncoding, tf.compat.v1.keras.layers.experimental.preprocessing.CategoryEncoding

tf.keras.layers.CategoryEncoding(
    num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs
)

This layer provides options for condensing data into a categorical encoding when the total number of tokens are known in advance. It accepts integer values as inputs, and it outputs a dense representation of those inputs. For integer inputs where the total number of tokens is not known, use instead tf.keras.layers.IntegerLookup.

Examples:

One-hot encoding data

layer = tf.keras.layers.CategoryEncoding(
          num_tokens=4, output_mode="one_hot")
layer([3, 2, 0, 1])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
  array([[0., 0., 0., 1.],
         [0., 0., 1., 0.],
         [1., 0., 0., 0.],
         [0., 1., 0., 0.]], dtype=float32)>

Multi-hot encoding data

layer = tf.keras.layers.CategoryEncoding(
          num_tokens=4, output_mode="multi_hot")
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
  array([[1., 1., 0., 0.],
         [1., 0., 0., 0.],
         [0., 1., 1., 0.],
         [0., 1., 0., 1.]], dtype=float32)>

Using weighted inputs in "count" mode

layer = tf.keras.layers.CategoryEncoding(
          num_tokens=4, output_mode="count")
count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
  array([[0.1, 0.2, 0. , 0. ],
         [0.2, 0. , 0. , 0. ],
         [0. , 0.2, 0.3, 0. ],
         [0. , 0.2, 0. , 0.4]])>

Args
`num_tokens`	The total number of tokens the layer should support. All inputs to the layer must integers in the range `0 <= value < num_tokens`, or an error will be thrown.
`output_mode`	Specification for the output of the layer. Defaults to `"multi_hot"`. Values can be `"one_hot"`, `"multi_hot"` or `"count"`, configuring the layer as follows: `"one_hot"`: Encodes each individual element in the input into an array of `num_tokens` size, containing a 1 at the element index. If the last dimension is size 1, will encode on that dimension. If the last dimension is not size 1, will append a new dimension for the encoded output. `"multi_hot"`: Encodes each sample in the input into a single array of `num_tokens` size, containing a 1 for each vocabulary term present in the sample. Treats the last dimension as the sample dimension, if input shape is `(..., sample_length)`, output shape will be `(..., num_tokens)`. `"count"`: Like `"multi_hot"`, but the int array contains a count of the number of times the token at that index appeared in the sample. For all output modes, currently only output up to rank 2 is supported.
`sparse`	Boolean. If true, returns a `SparseTensor` instead of a dense `Tensor`. Defaults to `False`.

Args

num_tokens The total number of tokens the layer should support. All inputs to the layer must integers in the range 0 <= value < num_tokens, or an error will be thrown.

output_mode

Specification for the output of the layer. Defaults to "multi_hot". Values can be "one_hot", "multi_hot" or "count", configuring the layer as follows:

"one_hot": Encodes each individual element in the input into an array of num_tokens size, containing a 1 at the element index. If the last dimension is size 1, will encode on that dimension. If the last dimension is not size 1, will append a new dimension for the encoded output.
"multi_hot": Encodes each sample in the input into a single array of num_tokens size, containing a 1 for each vocabulary term present in the sample. Treats the last dimension as the sample dimension, if input shape is (..., sample_length), output shape will be (..., num_tokens).
"count": Like "multi_hot", but the int array contains a count of the number of times the token at that index appeared in the sample. For all output modes, currently only output up to rank 2 is supported.

sparse Boolean. If true, returns a SparseTensor instead of a dense Tensor. Defaults to False.

Call arguments:

inputs: A 1D or 2D tensor of integer inputs.
count_weights: A tensor in the same shape as inputs indicating the weight for each sample value when summing up in count mode. Not used in "multi_hot" or "one_hot" modes.

tf.keras.layers.CategoryEncoding Stay organized with collections Save and categorize content based on your preferences.

View aliases

Examples:

Args

Call arguments:

tf.keras.layers.CategoryEncoding