View source on GitHub
|
Additive attention layer, a.k.a. Bahdanau-style attention.
tf.keras.layers.AdditiveAttention(
use_scale=True, **kwargs
)
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor
of shape [batch_size, Tv, dim] and key tensor of shape
[batch_size, Tv, dim]. The calculation follows the steps:
- Reshape
queryandkeyinto shapes[batch_size, Tq, 1, dim]and[batch_size, 1, Tv, dim]respectively. - Calculate scores with shape
[batch_size, Tq, Tv]as a non-linear sum:scores = tf.reduce_sum(tf.tanh(query + key), axis=-1) - Use scores to calculate a distribution with shape
[batch_size, Tq, Tv]:distribution = tf.nn.softmax(scores). - Use
distributionto create a linear combination ofvaluewith shape[batch_size, Tq, dim]:return tf.matmul(distribution, value).
Args | |
|---|---|
use_scale
|
If True, will create a variable to scale the attention
scores.
|
dropout
|
Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to 0.0. |
Output | |
|---|---|
Attention outputs of shape [batch_size, Tq, dim].
[Optional] Attention scores after masking and softmax with shape
[batch_size, Tq, Tv].
|
The meaning of query, value and key depend on the application. In the
case of text similarity, for example, query is the sequence embeddings of
the first piece of text and value is the sequence embeddings of the second
piece of text. key is usually the same tensor as value.
Here is a code example for using AdditiveAttention in a CNN+Attention
network:
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(max_tokens, dimension)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)
# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
filters=100,
kernel_size=4,
# Use 'same' padding so outputs have the same shape as inputs.
padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.AdditiveAttention()(
[query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
View source on GitHub