O. Kuchaiev and B. Ginsburg
"Factorization Tricks for LSTM Networks", ICLR 2017 workshop.
In brief, a G-LSTM cell consists of one LSTM sub-cell per group, where each
sub-cell operates on an evenly-sized sub-vector of the input and produces an
evenly-sized sub-vector of the output. For example, a G-LSTM cell with 128
units and 4 groups consists of 4 LSTMs sub-cells with 32 units each. If that
G-LSTM cell is fed a 200-dim input, then each sub-cell receives a 50-dim part
of the input and produces a 32-dim part of the output.
Args
num_units
int, The number of units in the G-LSTM cell
initializer
(optional) The initializer to use for the weight and
projection matrices.
num_proj
(optional) int, The output dimensionality for the projection
matrices. If None, no projection is performed.
number_of_groups
(optional) int, number of groups to use.
If number_of_groups is 1, then it should be equivalent to LSTM cell
forget_bias
Biases of the forget gate are initialized by default to 1
in order to reduce the scale of forgetting at the beginning of
the training.
activation
Activation function of the inner states.
reuse
(optional) Python boolean describing whether to reuse variables
in an existing scope. If not True, and the existing scope already
has the given variables, an error is raised.
Raises
ValueError
If num_units or num_proj is not divisible by
number_of_groups.
Attributes
graph
DEPRECATED FUNCTION
output_size
Integer or TensorShape: size of outputs produced by this cell.
scope_name
state_size
size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers
or TensorShapes.
int, float, or unit Tensor representing the batch size.
dtype
the data type to use for the state.
Returns
If state_size is an int or TensorShape, then the return value is a
N-D tensor of shape [batch_size, state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is
a nested list or tuple (of the same structure) of 2-D tensors with
the shapes [batch_size, s] for each s in state_size.
[null,null,["Last updated 2020-10-01 UTC."],[],[],null,["# tf.contrib.rnn.GLSTMCell\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/contrib/rnn/python/ops/rnn_cell.py#L2248-L2465) |\n\nGroup LSTM cell (G-LSTM).\n\nInherits From: [`RNNCell`](../../../tf/nn/rnn_cell/RNNCell) \n\n tf.contrib.rnn.GLSTMCell(\n num_units, initializer=None, num_proj=None, number_of_groups=1, forget_bias=1.0,\n activation=tf.math.tanh, reuse=None\n )\n\nThe implementation is based on:\n\n\u003chttps://arxiv.org/abs/1703.10722\u003e\n\nO. Kuchaiev and B. Ginsburg\n\"Factorization Tricks for LSTM Networks\", ICLR 2017 workshop.\n\nIn brief, a G-LSTM cell consists of one LSTM sub-cell per group, where each\nsub-cell operates on an evenly-sized sub-vector of the input and produces an\nevenly-sized sub-vector of the output. For example, a G-LSTM cell with 128\nunits and 4 groups consists of 4 LSTMs sub-cells with 32 units each. If that\nG-LSTM cell is fed a 200-dim input, then each sub-cell receives a 50-dim part\nof the input and produces a 32-dim part of the output.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_units` | int, The number of units in the G-LSTM cell |\n| `initializer` | (optional) The initializer to use for the weight and projection matrices. |\n| `num_proj` | (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed. |\n| `number_of_groups` | (optional) int, number of groups to use. If `number_of_groups` is 1, then it should be equivalent to LSTM cell |\n| `forget_bias` | Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training. |\n| `activation` | Activation function of the inner states. |\n| `reuse` | (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Raises ------ ||\n|--------------|----------------------------------------------------------------------|\n| `ValueError` | If `num_units` or `num_proj` is not divisible by `number_of_groups`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `graph` | DEPRECATED FUNCTION \u003cbr /\u003e | **Warning:** THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Stop using this property because tf.layers layers no longer track their graph. |\n| `output_size` | Integer or TensorShape: size of outputs produced by this cell. |\n| `scope_name` | \u003cbr /\u003e |\n| `state_size` | size(s) of state(s) used by this cell. \u003cbr /\u003e It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `get_initial_state`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/ops/rnn_cell_impl.py#L281-L309) \n\n get_initial_state(\n inputs=None, batch_size=None, dtype=None\n )\n\n### `zero_state`\n\n[View source](https://github.com/tensorflow/tensorflow/blob/v1.15.0/tensorflow/python/ops/rnn_cell_impl.py#L311-L340) \n\n zero_state(\n batch_size, dtype\n )\n\nReturn zero-filled state tensor(s).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------------|---------------------------------------------------------|\n| `batch_size` | int, float, or unit Tensor representing the batch size. |\n| `dtype` | the data type to use for the state. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ||\n|---|---|\n| If `state_size` is an int or TensorShape, then the return value is a `N-D` tensor of shape `[batch_size, state_size]` filled with zeros. \u003cbr /\u003e If `state_size` is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of `2-D` tensors with the shapes `[batch_size, s]` for each s in `state_size`. ||\n\n\u003cbr /\u003e"]]