tf.contrib.seq2seq.LuongAttention

View source on GitHub

Implements Luong-style (multiplicative) attention scoring.

This attention has two forms. The first is standard Luong attention, as described in:

Minh-Thang Luong, Hieu Pham, Christopher D. Manning. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015.

The second is the scaled form inspired partly by the normalized form of Bahdanau attention.

To enable the second form, construct the object with parameter scale=True.

num_units The depth of the attention mechanism.
memory The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
memory_sequence_length (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
scale Python boolean. Whether to scale the energy term.
probability_fn (optional) A callable. Converts the score to probabilities. The default is tf.nn.softmax. Other options include tf.contrib.seq2seq.hardmax and tf.contrib.sparsemax.sparsemax. Its signature should be: probabilities = probability_fn(score).
score_mask_value (optional) The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
dtype The data type for the memory layer of the attention mechanism.
custom_key_value_fn (optional): The custom function for computing keys and values.
name Name to use when creating ops.

alignments_size

batch_size

keys

memory_layer

query_layer

state_size

values

Methods

initial_alignments

View source

Creates the initial alignment values for the AttentionWrapper class.

This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention).

The default behavior is to return a tensor of all zeros.

Args
batch_size int32 scalar, the batch_size.
dtype The dtype.

Returns
A dtype tensor shaped [batch_size, alignments_size] (alignments_size is the values' max_time).

initial_state

View source

Creates the initial state values for the AttentionWrapper class.

This is important for AttentionMechanisms that use the previous alignment to calculate the alignment at the next time step (e.g. monotonic attention).

The default behavior is to return the same output as initial_alignments.

Args
batch_size int32 scalar, the batch_size.
dtype The dtype.

Returns
A structure of all-zero tensors with shapes as described by state_size.

__call__

View source

Score the query based on the keys and values.

Args
query Tensor of dtype matching self.values and shape [batch_size, query_depth].
state Tensor of dtype matching self.values and shape [batch_size, alignments_size] (alignments_size is memory's max_time).

Returns
alignments Tensor of dtype matching self.values and shape [batch_size, alignments_size] (alignments_size is memory's max_time).