Dot-product attention layer, a.k.a. Luong-style attention.
Inherits From: Layer, Operation
tf.keras.layers.Attention(
use_scale=False,
score_mode='dot',
dropout=0.0,
seed=None,
**kwargs
)
Inputs are a list with 2 or 3 elements:
- A
query tensor of shape (batch_size, Tq, dim).
- A
value tensor of shape (batch_size, Tv, dim).
- A optional
key tensor of shape (batch_size, Tv, dim). If none
supplied, value will be used as a key.
The calculation follows the steps:
- Calculate attention scores using
query and key with shape
(batch_size, Tq, Tv).
- Use scores to calculate a softmax distribution with shape
(batch_size, Tq, Tv).
- Use the softmax distribution to create a linear combination of
value
with shape (batch_size, Tq, dim).
Args |
use_scale
|
If True, will create a scalar variable to scale the
attention scores.
|
dropout
|
Float between 0 and 1. Fraction of the units to drop for the
attention scores. Defaults to 0.0.
|
seed
|
A Python integer to use as random seed incase of dropout.
|
score_mode
|
Function to use to compute attention scores, one of
{"dot", "concat"}. "dot" refers to the dot product between the
query and key vectors. "concat" refers to the hyperbolic tangent
of the concatenation of the query and key vectors.
|
Call Args |
inputs
|
List of the following tensors:
query: Query tensor of shape (batch_size, Tq, dim).
value: Value tensor of shape (batch_size, Tv, dim).
key: Optional key tensor of shape (batch_size, Tv, dim). If
not given, will use value for both key and value, which is
the most common case.
|
mask
|
List of the following tensors:
query_mask: A boolean mask tensor of shape (batch_size, Tq).
If given, the output will be zero at the positions where
mask==False.
value_mask: A boolean mask tensor of shape (batch_size, Tv).
If given, will apply the mask such that values at positions
where mask==False do not contribute to the result.
|
return_attention_scores
|
bool, it True, returns the attention scores
(after masking and softmax) as an additional output argument.
|
training
|
Python boolean indicating whether the layer should behave in
training mode (adding dropout) or in inference mode (no dropout).
|
use_causal_mask
|
Boolean. Set to True for decoder self-attention. Adds
a mask such that position i cannot attend to positions j > i.
This prevents the flow of information from the future towards the
past. Defaults to False.
|
Output |
Attention outputs of shape (batch_size, Tq, dim).
(Optional) Attention scores after masking and softmax with shape
(batch_size, Tq, Tv).
|
Attributes |
input
|
Retrieves the input tensor(s) of a symbolic operation.
Only returns the tensor(s) corresponding to the first time
the operation was called.
|
output
|
Retrieves the output tensor(s) of a layer.
Only returns the tensor(s) corresponding to the first time
the operation was called.
|
Methods
from_config
View source
@classmethod
from_config(
config
)
Creates a layer from its config.
This method is the reverse of get_config,
capable of instantiating the same layer from the config
dictionary. It does not handle layer connectivity
(handled by Network), nor weights (handled by set_weights).
| Args |
config
|
A Python dictionary, typically the
output of get_config.
|
| Returns |
|
A layer instance.
|
symbolic_call
View source
symbolic_call(
*args, **kwargs
)