tf.contrib.rnn.CoupledInputForgetGateLSTMCell

View source on GitHub

Long short-term memory unit (LSTM) recurrent network cell.

Inherits From: RNNCell

The default non-peephole implementation is based on:

https://pdfs.semanticscholar.org/1154/0131eae85b2e11d53df7f1360eeb6476e7f4.pdf

Felix Gers, Jurgen Schmidhuber, and Fred Cummins. "Learning to forget: Continual prediction with LSTM." IET, 850-855, 1999.

The peephole implementation is based on:

https://research.google.com/pubs/archive/43905.pdf

Hasim Sak, Andrew Senior, and Francoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." INTERSPEECH, 2014.

The coupling of input and forget gate is based on:

http://arxiv.org/pdf/1503.04069.pdf

Greff et al. "LSTM: A Search Space Odyssey"

The class uses optional peep-hole connections, and an optional projection layer. Layer normalization implementation is based on:

https://arxiv.org/abs/1607.06450

"Layer Normalization" Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

and is applied before the internal nonlinearities.

num_units int, The number of units in the LSTM cell
use_peepholes bool, set True to enable diagonal/peephole connections.
initializer (optional) The initializer to use for the weight and projection matrices.
num_proj (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.
proj_clip (optional) A float value. If num_proj > 0 and proj_clip is provided, then the projected values are clipped elementwise to within [-proj_clip, proj_clip].
num_unit_shards How to split the weight matrix. If >1, the weight matrix is stored across num_unit_shards.
num_proj_shards How to split the projection matrix. If >1, the projection matrix is stored across num_proj_shards.
forget_bias Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
state_is_tuple If True, accepted and returned states are 2-tuples of the c_state and m_state. By default (False), they are concatenated along the column axis. This default behavior will soon be deprecated.
activation Activation function of the inner states.
reuse (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
layer_norm If True, layer normalization will be applied.
norm_gain float, The layer normalization gain initial value. If layer_norm has been set to False, this argument will be ignored.
norm_shift float, The layer normalization shift initial value. If layer_norm has been set to False, this argument will be ignored.

graph DEPRECATED FUNCTION

output_size Integer or TensorShape: size of outputs produced by this cell.
scope_name

state_size size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

Methods

get_initial_state

View source

zero_state

View source

Return zero-filled state tensor(s).

Args
batch_size int, float, or unit Tensor representing the batch size.
dtype the data type to use for the state.

Returns
If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.