tf.keras.layers.CategoryEncoding
Stay organized with collections
Save and categorize content based on your preferences.
A preprocessing layer which encodes integer features.
Inherits From: Layer
, Module
tf.keras.layers.CategoryEncoding(
num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs
)
This layer provides options for condensing data into a categorical encoding
when the total number of tokens are known in advance. It accepts integer
values as inputs, and it outputs a dense or sparse representation of those
inputs. For integer inputs where the total number of tokens is not known,
use tf.keras.layers.IntegerLookup
instead.
For an overview and full list of preprocessing layers, see the preprocessing
guide.
Examples:
One-hot encoding data
layer = tf.keras.layers.CategoryEncoding(
num_tokens=4, output_mode="one_hot")
layer([3, 2, 0, 1])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.]], dtype=float32)>
Multi-hot encoding data
layer = tf.keras.layers.CategoryEncoding(
num_tokens=4, output_mode="multi_hot")
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[1., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 0., 1.]], dtype=float32)>
Using weighted inputs in "count"
mode
layer = tf.keras.layers.CategoryEncoding(
num_tokens=4, output_mode="count")
count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
array([[0.1, 0.2, 0. , 0. ],
[0.2, 0. , 0. , 0. ],
[0. , 0.2, 0.3, 0. ],
[0. , 0.2, 0. , 0.4]], dtype=float32)>
Args |
num_tokens
|
The total number of tokens the layer should support. All
inputs to the layer must integers in the range 0 <= value <
num_tokens , or an error will be thrown.
|
output_mode
|
Specification for the output of the layer.
Values can be "one_hot" , "multi_hot" or
"count" , configuring the layer as follows:
"one_hot" : Encodes each individual element in the input into an
array of num_tokens size, containing a 1 at the element index. If
the last dimension is size 1, will encode on that dimension. If the
last dimension is not size 1, will append a new dimension for the
encoded output.
"multi_hot" : Encodes each sample in the input into a single array
of num_tokens size, containing a 1 for each vocabulary term
present in the sample. Treats the last dimension as the sample
dimension, if input shape is (..., sample_length) , output shape
will be (..., num_tokens) .
"count" : Like "multi_hot" , but the int array contains a count of
the number of times the token at that index appeared in the sample.
For all output modes, currently only output up to rank 2 is supported.
Defaults to "multi_hot" .
|
sparse
|
Boolean. If true, returns a SparseTensor instead of a dense
Tensor . Defaults to False .
|
Call arguments |
inputs
|
A 1D or 2D tensor of integer inputs.
|
count_weights
|
A tensor in the same shape as inputs indicating the
weight for each sample value when summing up in count mode. Not used
in "multi_hot" or "one_hot" modes.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-01-23 UTC.
[null,null,["Last updated 2024-01-23 UTC."],[],[],null,["# tf.keras.layers.CategoryEncoding\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/keras-team/keras/tree/v2.15.0/keras/layers/preprocessing/category_encoding.py#L36-L231) |\n\nA preprocessing layer which encodes integer features.\n\nInherits From: [`Layer`](../../../tf/keras/layers/Layer), [`Module`](../../../tf/Module)\n\n#### View aliases\n\n\n**Main aliases**\n\n[`tf.keras.layers.experimental.preprocessing.CategoryEncoding`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding)\n\n\u003cbr /\u003e\n\n tf.keras.layers.CategoryEncoding(\n num_tokens=None, output_mode='multi_hot', sparse=False, **kwargs\n )\n\nThis layer provides options for condensing data into a categorical encoding\nwhen the total number of tokens are known in advance. It accepts integer\nvalues as inputs, and it outputs a dense or sparse representation of those\ninputs. For integer inputs where the total number of tokens is not known,\nuse [`tf.keras.layers.IntegerLookup`](../../../tf/keras/layers/IntegerLookup) instead.\n\nFor an overview and full list of preprocessing layers, see the preprocessing\n[guide](https://www.tensorflow.org/guide/keras/preprocessing_layers).\n\n#### Examples:\n\n**One-hot encoding data** \n\n layer = tf.keras.layers.CategoryEncoding(\n num_tokens=4, output_mode=\"one_hot\")\n layer([3, 2, 0, 1])\n \u003ctf.Tensor: shape=(4, 4), dtype=float32, numpy=\n array([[0., 0., 0., 1.],\n [0., 0., 1., 0.],\n [1., 0., 0., 0.],\n [0., 1., 0., 0.]], dtype=float32)\u003e\n\n**Multi-hot encoding data** \n\n layer = tf.keras.layers.CategoryEncoding(\n num_tokens=4, output_mode=\"multi_hot\")\n layer([[0, 1], [0, 0], [1, 2], [3, 1]])\n \u003ctf.Tensor: shape=(4, 4), dtype=float32, numpy=\n array([[1., 1., 0., 0.],\n [1., 0., 0., 0.],\n [0., 1., 1., 0.],\n [0., 1., 0., 1.]], dtype=float32)\u003e\n\n**Using weighted inputs in `\"count\"` mode** \n\n layer = tf.keras.layers.CategoryEncoding(\n num_tokens=4, output_mode=\"count\")\n count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])\n layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)\n \u003ctf.Tensor: shape=(4, 4), dtype=float64, numpy=\n array([[0.1, 0.2, 0. , 0. ],\n [0.2, 0. , 0. , 0. ],\n [0. , 0.2, 0.3, 0. ],\n [0. , 0.2, 0. , 0.4]], dtype=float32)\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_tokens` | The total number of tokens the layer should support. All inputs to the layer must integers in the range `0 \u003c= value \u003c num_tokens`, or an error will be thrown. |\n| `output_mode` | Specification for the output of the layer. Values can be `\"one_hot\"`, `\"multi_hot\"` or `\"count\"`, configuring the layer as follows: \u003cbr /\u003e - `\"one_hot\"`: Encodes each individual element in the input into an array of `num_tokens` size, containing a 1 at the element index. If the last dimension is size 1, will encode on that dimension. If the last dimension is not size 1, will append a new dimension for the encoded output. - `\"multi_hot\"`: Encodes each sample in the input into a single array of `num_tokens` size, containing a 1 for each vocabulary term present in the sample. Treats the last dimension as the sample dimension, if input shape is `(..., sample_length)`, output shape will be `(..., num_tokens)`. - `\"count\"`: Like `\"multi_hot\"`, but the int array contains a count of the number of times the token at that index appeared in the sample. For all output modes, currently only output up to rank 2 is supported. Defaults to `\"multi_hot\"`. |\n| `sparse` | Boolean. If true, returns a `SparseTensor` instead of a dense `Tensor`. Defaults to `False`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Call arguments -------------- ||\n|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `inputs` | A 1D or 2D tensor of integer inputs. |\n| `count_weights` | A tensor in the same shape as `inputs` indicating the weight for each sample value when summing up in `count` mode. Not used in `\"multi_hot\"` or `\"one_hot\"` modes. |\n\n\u003cbr /\u003e"]]