tf.contrib.legacy_seq2seq.attention_decoder
RNN decoder with attention for the sequence-to-sequence model.
tf.contrib.legacy_seq2seq.attention_decoder(
decoder_inputs, initial_state, attention_states, cell, output_size=None,
num_heads=1, loop_function=None, dtype=None, scope=None,
initial_state_attention=False
)
In this context "attention" means that, during decoding, the RNN can look up
information in the additional tensor attention_states, and it does this by
focusing on a few entries from the tensor. This model has proven to yield
especially good results in a number of sequence-to-sequence tasks. This
implementation is based on http://arxiv.org/abs/1412.7449 (see below for
details). It is recommended for complex sequence-to-sequence tasks.
Args |
decoder_inputs
|
A list of 2D Tensors [batch_size x input_size].
|
initial_state
|
2D Tensor [batch_size x cell.state_size].
|
attention_states
|
3D Tensor [batch_size x attn_length x attn_size].
|
cell
|
tf.compat.v1.nn.rnn_cell.RNNCell defining the cell function and size.
|
output_size
|
Size of the output vectors; if None, we use cell.output_size.
|
num_heads
|
Number of attention heads that read from attention_states.
|
loop_function
|
If not None, this function will be applied to i-th output in
order to generate i+1-th input, and decoder_inputs will be ignored, except
for the first element ("GO" symbol). This can be used for decoding,
but also for training to emulate http://arxiv.org/abs/1506.03099
Signature -- loop_function(prev, i) = next * prev is a 2D Tensor of
shape [batch_size x output_size], * i is an integer, the step number
(when advanced control is needed), * next is a 2D Tensor of shape
[batch_size x input_size].
|
dtype
|
The dtype to use for the RNN initial state (default: tf.float32).
|
scope
|
VariableScope for the created subgraph; default: "attention_decoder".
|
initial_state_attention
|
If False (default), initial attentions are zero. If
True, initialize the attentions from the initial state and attention
states -- useful when we wish to resume decoding from a previously stored
decoder state and attention states.
|
Returns |
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors of
shape [batch_size x output_size]. These represent the generated outputs.
Output i is computed from input i (which is either the i-th element
of decoder_inputs or loop_function(output {i-1}, i)) as follows.
First, we run the cell on a combination of the input and previous
attention masks:
cell_output, new_state = cell(linear(input, prev_attn), prev_state).
Then, we calculate new attention masks:
new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))
and then we calculate the output:
output = linear(cell_output, new_attn).
state: The state of each decoder cell the final time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
|
Raises |
ValueError
|
when num_heads is not positive, there are no inputs, shapes
of attention_states are not set, or input size cannot be inferred
from the input.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2020-10-01 UTC.
[null,null,["Last updated 2020-10-01 UTC."],[],[]]