TF 2.0 is out! Get hands-on practice at TF World, Oct 28-31. Use code TF20 for 20% off select passes. Register now

tf.contrib.rnn.IndyLSTMCell

View source on GitHub

Class IndyLSTMCell

Basic IndyLSTM recurrent network cell.

Inherits From: LayerRNNCell

Based on IndRNNs (https://arxiv.org/abs/1803.04831) and similar to BasicLSTMCell, yet with the \(U_f\), \(U_i\), \(U_o\) and \(U_c\) matrices in the regular LSTM equations replaced by diagonal matrices, i.e. a Hadamard product with a single vector:

\(f_t = \sigma_g\left(W_f x_t + u_f \circ h_{t-1} + b_f\right)\)
\(i_t = \sigma_g\left(W_i x_t + u_i \circ h_{t-1} + b_i\right)\)
\(o_t = \sigma_g\left(W_o x_t + u_o \circ h_{t-1} + b_o\right)\)
\(c_t = f_t \circ c_{t-1} + i_t \circ \sigma_c\left(W_c x_t + u_c \circ h_{t-1} + b_c\right)\)

where \(\circ\) denotes the Hadamard operator. This means that each IndyLSTM node sees only its own state \(h\) and \(c\), as opposed to seeing all states in the same layer.

We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.

It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline.

For a detailed analysis of IndyLSTMs, see https://arxiv.org/abs/1903.08023.

__init__

View source

__init__(
    num_units,
    forget_bias=1.0,
    activation=None,
    reuse=None,
    kernel_initializer=None,
    bias_initializer=None,
    name=None,
    dtype=None
)

Initialize the IndyLSTM cell.

Args:

  • num_units: int, The number of units in the LSTM cell.
  • forget_bias: float, The bias added to forget gates (see above). Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
  • activation: Activation function of the inner states. Default: tanh.
  • reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
  • kernel_initializer: (optional) The initializer to use for the weight matrix applied to the inputs.
  • bias_initializer: (optional) The initializer to use for the bias.
  • name: String, the name of the layer. Layers with the same name will share weights, but to avoid mistakes we require reuse=True in such cases.
  • dtype: Default dtype of the layer (default of None means use the type of the first input). Required when build is called before call.

Properties

graph

DEPRECATED FUNCTION

output_size

scope_name

state_size

Methods

get_initial_state

View source

get_initial_state(
    inputs=None,
    batch_size=None,
    dtype=None
)

zero_state

View source

zero_state(
    batch_size,
    dtype
)

Return zero-filled state tensor(s).

Args:

  • batch_size: int, float, or unit Tensor representing the batch size.
  • dtype: the data type to use for the state.

Returns:

If state_size is an int or TensorShape, then the return value is a N-D tensor of shape [batch_size, state_size] filled with zeros.

If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2-D tensors with the shapes [batch_size, s] for each s in state_size.