tf.keras.layers.AdditiveAttention
Stay organized with collections
Save and categorize content based on your preferences.
Additive attention layer, a.k.a. Bahdanau-style attention.
Inherits From: Layer
, Module
View aliases
Compat aliases for migration
See
Migration guide for
more details.
`tf.compat.v1.keras.layers.AdditiveAttention`
tf.keras.layers.AdditiveAttention(
use_scale=True, **kwargs
)
Inputs are query
tensor of shape [batch_size, Tq, dim]
, value
tensor
of shape [batch_size, Tv, dim]
and key
tensor of shape
[batch_size, Tv, dim]
. The calculation follows the steps:
- Reshape
query
and key
into shapes [batch_size, Tq, 1, dim]
and [batch_size, 1, Tv, dim]
respectively.
- Calculate scores with shape
[batch_size, Tq, Tv]
as a non-linear
sum: scores = tf.reduce_sum(tf.tanh(query + key), axis=-1)
- Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]
: distribution = tf.nn.softmax(scores)
.
- Use
distribution
to create a linear combination of value
with
shape [batch_size, Tq, dim]
:
return tf.matmul(distribution, value)
.
Args |
use_scale
|
If True , will create a variable to scale the attention
scores.
|
dropout
|
Float between 0 and 1. Fraction of the units to drop for the
attention scores. Defaults to 0.0 .
|
Call Args |
inputs
|
List of the following tensors:
- query: Query
Tensor of shape [batch_size, Tq, dim] .
- value: Value
Tensor of shape [batch_size, Tv, dim] .
- key: Optional key
Tensor of shape [batch_size, Tv, dim] .
If not given, will use value for both key and value ,
which is the most common case.
|
mask
|
List of the following tensors:
query_mask: A boolean mask Tensor of shape [batch_size, Tq] .
If given, the output will be zero at the positions where
mask==False .
value_mask: A boolean mask Tensor of shape [batch_size, Tv] .
If given, will apply the mask such that values at positions
where mask==False do not contribute to the result.
|
training
|
Python boolean indicating whether the layer should behave in
training mode (adding dropout) or in inference mode (no dropout).
|
return_attention_scores
|
bool, it True , returns the attention scores
(after masking and softmax) as an additional output argument.
|
use_causal_mask
|
Boolean. Set to True for decoder self-attention. Adds
a mask such that position i cannot attend to positions j > i .
This prevents the flow of information from the future towards the
past. Defaults to False .
|
Output |
Attention outputs of shape [batch_size, Tq, dim] .
[Optional] Attention scores after masking and softmax with shape
[batch_size, Tq, Tv] .
|
The meaning of query
, value
and key
depend on the application. In the
case of text similarity, for example, query
is the sequence embeddings of
the first piece of text and value
is the sequence embeddings of the second
piece of text. key
is usually the same tensor as value
.
Here is a code example for using AdditiveAttention
in a CNN+Attention
network:
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)
# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
filters=100,
kernel_size=4,
# Use 'same' padding so outputs have the same shape as inputs.
padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.AdditiveAttention()(
[query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-10-06 UTC.
[null,null,["Last updated 2023-10-06 UTC."],[],[],null,["# tf.keras.layers.AdditiveAttention\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/keras-team/keras/tree/v2.13.1/keras/layers/attention/additive_attention.py#L30-L178) |\n\nAdditive attention layer, a.k.a. Bahdanau-style attention.\n\nInherits From: [`Layer`](../../../tf/keras/layers/Layer), [`Module`](../../../tf/Module)\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n\\`tf.compat.v1.keras.layers.AdditiveAttention\\`\n\n\u003cbr /\u003e\n\n tf.keras.layers.AdditiveAttention(\n use_scale=True, **kwargs\n )\n\nInputs are `query` tensor of shape `[batch_size, Tq, dim]`, `value` tensor\nof shape `[batch_size, Tv, dim]` and `key` tensor of shape\n`[batch_size, Tv, dim]`. The calculation follows the steps:\n\n1. Reshape `query` and `key` into shapes `[batch_size, Tq, 1, dim]` and `[batch_size, 1, Tv, dim]` respectively.\n2. Calculate scores with shape `[batch_size, Tq, Tv]` as a non-linear sum: `scores = tf.reduce_sum(tf.tanh(query + key), axis=-1)`\n3. Use scores to calculate a distribution with shape `[batch_size, Tq, Tv]`: `distribution = tf.nn.softmax(scores)`.\n4. Use `distribution` to create a linear combination of `value` with shape `[batch_size, Tq, dim]`: `return tf.matmul(distribution, value)`.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-------------|---------------------------------------------------------------------------------------------------|\n| `use_scale` | If `True`, will create a variable to scale the attention scores. |\n| `dropout` | Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to `0.0`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Call Args --------- ||\n|---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `inputs` | List of the following tensors: \u003cbr /\u003e - query: Query `Tensor` of shape `[batch_size, Tq, dim]`. - value: Value `Tensor` of shape `[batch_size, Tv, dim]`. - key: Optional key `Tensor` of shape `[batch_size, Tv, dim]`. If not given, will use `value` for both `key` and `value`, which is the most common case. |\n| `mask` | List of the following tensors: - query_mask: A boolean mask `Tensor` of shape `[batch_size, Tq]`. If given, the output will be zero at the positions where `mask==False`. - value_mask: A boolean mask `Tensor` of shape `[batch_size, Tv]`. If given, will apply the mask such that values at positions where `mask==False` do not contribute to the result. |\n| `training` | Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout). |\n| `return_attention_scores` | bool, it `True`, returns the attention scores (after masking and softmax) as an additional output argument. |\n| `use_causal_mask` | Boolean. Set to `True` for decoder self-attention. Adds a mask such that position `i` cannot attend to positions `j \u003e i`. This prevents the flow of information from the future towards the past. Defaults to `False`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Output ------ ||\n|---|---|\n| Attention outputs of shape `[batch_size, Tq, dim]`. \\[Optional\\] Attention scores after masking and softmax with shape `[batch_size, Tq, Tv]`. ||\n\n\u003cbr /\u003e\n\nThe meaning of `query`, `value` and `key` depend on the application. In the\ncase of text similarity, for example, `query` is the sequence embeddings of\nthe first piece of text and `value` is the sequence embeddings of the second\npiece of text. `key` is usually the same tensor as `value`.\n\nHere is a code example for using `AdditiveAttention` in a CNN+Attention\nnetwork: \n\n # Variable-length int sequences.\n query_input = tf.keras.Input(shape=(None,), dtype='int32')\n value_input = tf.keras.Input(shape=(None,), dtype='int32')\n\n # Embedding lookup.\n token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)\n # Query embeddings of shape [batch_size, Tq, dimension].\n query_embeddings = token_embedding(query_input)\n # Value embeddings of shape [batch_size, Tv, dimension].\n value_embeddings = token_embedding(value_input)\n\n # CNN layer.\n cnn_layer = tf.keras.layers.Conv1D(\n filters=100,\n kernel_size=4,\n # Use 'same' padding so outputs have the same shape as inputs.\n padding='same')\n # Query encoding of shape [batch_size, Tq, filters].\n query_seq_encoding = cnn_layer(query_embeddings)\n # Value encoding of shape [batch_size, Tv, filters].\n value_seq_encoding = cnn_layer(value_embeddings)\n\n # Query-value attention of shape [batch_size, Tq, filters].\n query_value_attention_seq = tf.keras.layers.AdditiveAttention()(\n [query_seq_encoding, value_seq_encoding])\n\n # Reduce over the sequence axis to produce encodings of shape\n # [batch_size, filters].\n query_encoding = tf.keras.layers.GlobalAveragePooling1D()(\n query_seq_encoding)\n query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(\n query_value_attention_seq)\n\n # Concatenate query and document encodings to produce a DNN input layer.\n input_layer = tf.keras.layers.Concatenate()(\n [query_encoding, query_value_attention])\n\n # Add DNN layers, and create Model.\n # ..."]]