Dot-product attention layer, a.k.a. Luong-style attention.
Inherits From: Layer
, Operation
tf.keras.layers.Attention(
use_scale=False,
score_mode='dot',
dropout=0.0,
seed=None,
**kwargs
)
Inputs are a list with 2 or 3 elements:
- A
query
tensor of shape (batch_size, Tq, dim)
.
- A
value
tensor of shape (batch_size, Tv, dim)
.
- A optional
key
tensor of shape (batch_size, Tv, dim)
. If none
supplied, value
will be used as a key
.
The calculation follows the steps:
- Calculate attention scores using
query
and key
with shape
(batch_size, Tq, Tv)
.
- Use scores to calculate a softmax distribution with shape
(batch_size, Tq, Tv)
.
- Use the softmax distribution to create a linear combination of
value
with shape (batch_size, Tq, dim)
.
Args |
use_scale
|
If True , will create a scalar variable to scale the
attention scores.
|
dropout
|
Float between 0 and 1. Fraction of the units to drop for the
attention scores. Defaults to 0.0 .
|
seed
|
A Python integer to use as random seed incase of dropout .
|
score_mode
|
Function to use to compute attention scores, one of
{"dot", "concat"} . "dot" refers to the dot product between the
query and key vectors. "concat" refers to the hyperbolic tangent
of the concatenation of the query and key vectors.
|
Call Args |
inputs
|
List of the following tensors:
query : Query tensor of shape (batch_size, Tq, dim) .
value : Value tensor of shape (batch_size, Tv, dim) .
key : Optional key tensor of shape (batch_size, Tv, dim) . If
not given, will use value for both key and value , which is
the most common case.
|
mask
|
List of the following tensors:
query_mask : A boolean mask tensor of shape (batch_size, Tq) .
If given, the output will be zero at the positions where
mask==False .
value_mask : A boolean mask tensor of shape (batch_size, Tv) .
If given, will apply the mask such that values at positions
where mask==False do not contribute to the result.
|
return_attention_scores
|
bool, it True , returns the attention scores
(after masking and softmax) as an additional output argument.
|
training
|
Python boolean indicating whether the layer should behave in
training mode (adding dropout) or in inference mode (no dropout).
|
use_causal_mask
|
Boolean. Set to True for decoder self-attention. Adds
a mask such that position i cannot attend to positions j > i .
This prevents the flow of information from the future towards the
past. Defaults to False .
|
Output |
Attention outputs of shape (batch_size, Tq, dim) .
(Optional) Attention scores after masking and softmax with shape
(batch_size, Tq, Tv) .
|
Attributes |
input
|
Retrieves the input tensor(s) of a symbolic operation.
Only returns the tensor(s) corresponding to the first time
the operation was called.
|
output
|
Retrieves the output tensor(s) of a layer.
Only returns the tensor(s) corresponding to the first time
the operation was called.
|
Methods
from_config
View source
@classmethod
from_config(
config
)
Creates a layer from its config.
This method is the reverse of get_config
,
capable of instantiating the same layer from the config
dictionary. It does not handle layer connectivity
(handled by Network), nor weights (handled by set_weights
).
Args |
config
|
A Python dictionary, typically the
output of get_config.
|
Returns |
A layer instance.
|
symbolic_call
View source
symbolic_call(
*args, **kwargs
)