Long Short-Term Memory layer - Hochreiter 1997.
Inherits From: RNN
, Layer
, Operation
tf.keras.layers.LSTM(
units,
activation='tanh',
recurrent_activation='sigmoid',
use_bias=True,
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros',
unit_forget_bias=True,
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0,
seed=None,
return_sequences=False,
return_state=False,
go_backwards=False,
stateful=False,
unroll=False,
use_cudnn='auto',
**kwargs
)
Used in the notebooks
Used in the guide | Used in the tutorials |
---|---|
Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or backend-native) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation when using the TensorFlow backend. The requirements to use the cuDNN implementation are:
activation
==tanh
recurrent_activation
==sigmoid
dropout
== 0 andrecurrent_dropout
== 0unroll
isFalse
use_bias
isTrue
- Inputs, if use masking, are strictly right-padded.
- Eager execution is enabled in the outermost context.
For example:
inputs = np.random.random((32, 10, 8))
lstm = keras.layers.LSTM(4)
output = lstm(inputs)
output.shape
(32, 4)
lstm = keras.layers.LSTM(
4, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
whole_seq_output.shape
(32, 10, 4)
final_memory_state.shape
(32, 4)
final_carry_state.shape
(32, 4)
Args | |
---|---|
units
|
Positive integer, dimensionality of the output space. |
activation
|
Activation function to use.
Default: hyperbolic tangent (tanh ).
If you pass None , no activation is applied
(ie. "linear" activation: a(x) = x ).
|
recurrent_activation
|
Activation function to use
for the recurrent step.
Default: sigmoid (sigmoid ).
If you pass None , no activation is applied
(ie. "linear" activation: a(x) = x ).
|
use_bias
|
Boolean, (default True ), whether the layer
should use a bias vector.
|
kernel_initializer
|
Initializer for the kernel weights matrix,
used for the linear transformation of the inputs. Default:
"glorot_uniform" .
|
recurrent_initializer
|
Initializer for the recurrent_kernel
weights matrix, used for the linear transformation of the recurrent
state. Default: "orthogonal" .
|
bias_initializer
|
Initializer for the bias vector. Default: "zeros" .
|
unit_forget_bias
|
Boolean (default True ). If True ,
add 1 to the bias of the forget gate at initialization.
Setting it to True will also force bias_initializer="zeros" .
This is recommended in Jozefowicz et al.
|
kernel_regularizer
|
Regularizer function applied to the kernel weights
matrix. Default: None .
|
recurrent_regularizer
|
Regularizer function applied to the
recurrent_kernel weights matrix. Default: None .
|
bias_regularizer
|
Regularizer function applied to the bias vector.
Default: None .
|
activity_regularizer
|
Regularizer function applied to the output of the
layer (its "activation"). Default: None .
|
kernel_constraint
|
Constraint function applied to the kernel weights
matrix. Default: None .
|
recurrent_constraint
|
Constraint function applied to the
recurrent_kernel weights matrix. Default: None .
|
bias_constraint
|
Constraint function applied to the bias vector.
Default: None .
|
dropout
|
Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0. |
recurrent_dropout
|
Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0. |
seed
|
Random seed for dropout. |
return_sequences
|
Boolean. Whether to return the last output
in the output sequence, or the full sequence. Default: False .
|
return_state
|
Boolean. Whether to return the last state in addition
to the output. Default: False .
|
go_backwards
|
Boolean (default: False ).
If True , process the input sequence backwards and return the
reversed sequence.
|
stateful
|
Boolean (default: False ). If True , the last state
for each sample at index i in a batch will be used as initial
state for the sample of index i in the following batch.
|
unroll
|
Boolean (default False).
If True , the network will be unrolled,
else a symbolic loop will be used.
Unrolling can speed-up a RNN,
although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.
|
use_cudnn
|
Whether to use a cuDNN-backed implementation. "auto" will
attempt to use cuDNN when feasible, and will fallback to the
default implementation if not.
|
Methods
from_config
@classmethod
from_config( config )
Creates a layer from its config.
This method is the reverse of get_config
,
capable of instantiating the same layer from the config
dictionary. It does not handle layer connectivity
(handled by Network), nor weights (handled by set_weights
).
Args | |
---|---|
config
|
A Python dictionary, typically the output of get_config. |
Returns | |
---|---|
A layer instance. |
get_initial_state
get_initial_state(
batch_size
)
inner_loop
inner_loop(
sequences, initial_state, mask, training=False
)
reset_state
reset_state()
reset_states
reset_states()
symbolic_call
symbolic_call(
*args, **kwargs
)