tf.keras.layers.LayerNormalization

Layer normalization layer (Ba et al., 2016).

Inherits From: Layer, Operation

tf.keras.layers.LayerNormalization(
    axis=-1,
    epsilon=0.001,
    center=True,
    scale=True,
    rms_scaling=False,
    beta_initializer='zeros',
    gamma_initializer='ones',
    beta_regularizer=None,
    gamma_regularizer=None,
    beta_constraint=None,
    gamma_constraint=None,
    **kwargs
)

Used in the notebooks

Used in the tutorials
Image captioning with visual attention Neural machine translation with a Transformer and Keras Neural machine translation with attention

Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1.

If scale or center are enabled, the layer will scale the normalized outputs by broadcasting them with a trainable variable gamma, and center the outputs by broadcasting with a trainable variable beta. gamma will default to a ones tensor and beta will default to a zeros tensor, so that centering and scaling are no-ops before training has begun.

So, with scaling and centering enabled the normalization equations are as follows:

Let the intermediate activations for a mini-batch to be the inputs.

For each sample x_i in inputs with k features, we compute the mean and variance of the sample:

mean_i = sum(x_i[j] for j in range(k)) / k
var_i = sum((x_i[j] - mean_i) ** 2 for j in range(k)) / k

and then compute a normalized x_i_normalized, including a small factor epsilon for numerical stability.

x_i_normalized = (x_i - mean_i) / sqrt(var_i + epsilon)

And finally x_i_normalized is linearly transformed by gamma and beta, which are learned parameters:

output_i = x_i_normalized * gamma + beta

gamma and beta will span the axes of inputs specified in axis, and this part of the inputs' shape must be fully defined.

For example:

layer = keras.layers.LayerNormalization(axis=[1, 2, 3])
layer.build([5, 20, 30, 40])
print(layer.beta.shape)
(20, 30, 40)
print(layer.gamma.shape)
(20, 30, 40)

Note that other implementations of layer normalization may choose to define gamma and beta over a separate set of axes from the axes being normalized across. For example, Group Normalization (Wu et al. 2018) with group size of 1 corresponds to a Layer Normalization that normalizes across height, width, and channel and has gamma and beta span only the channel dimension. So, this Layer Normalization implementation will not match a Group Normalization layer with group size set to 1.

Args
`axis`	Integer or List/Tuple. The axis or axes to normalize across. Typically, this is the features axis/axes. The left-out axes are typically the batch axis/axes. `-1` is the last dimension in the input. Defaults to `-1`.
`epsilon`	Small float added to variance to avoid dividing by zero. Defaults to 1e-3.
`center`	If True, add offset of `beta` to normalized tensor. If False, `beta` is ignored. Defaults to `True`.
`scale`	If True, multiply by `gamma`. If False, `gamma` is not used. When the next layer is linear (also e.g. `nn.relu`), this can be disabled since the scaling will be done by the next layer. Defaults to `True`.
`rms_scaling`	If True, `center` and `scale` are ignored, and the inputs are scaled by `gamma` and the inverse square root of the square of all inputs. This is an approximate and faster approach that avoids ever computing the mean of the input.
`beta_initializer`	Initializer for the beta weight. Defaults to zeros.
`gamma_initializer`	Initializer for the gamma weight. Defaults to ones.
`beta_regularizer`	Optional regularizer for the beta weight. None by default.
`gamma_regularizer`	Optional regularizer for the gamma weight. None by default.
`beta_constraint`	Optional constraint for the beta weight. None by default.
`gamma_constraint`	Optional constraint for the gamma weight. None by default.
`**kwargs`	Base layer keyword arguments (e.g. `name` and `dtype`).

Reference:

Lei Ba et al., 2016.

Attributes
`input`	Retrieves the input tensor(s) of a symbolic operation. Only returns the tensor(s) corresponding to the first time the operation was called.
`output`	Retrieves the output tensor(s) of a layer. Only returns the tensor(s) corresponding to the first time the operation was called.

Attributes

input

Retrieves the input tensor(s) of a symbolic operation.

Only returns the tensor(s) corresponding to the first time the operation was called.

output

Retrieves the output tensor(s) of a layer.

Only returns the tensor(s) corresponding to the first time the operation was called.

Methods

`from_config`

View source

@classmethod
from_config(
    config
)

Creates a layer from its config.

This method is the reverse of get_config, capable of instantiating the same layer from the config dictionary. It does not handle layer connectivity (handled by Network), nor weights (handled by set_weights).

Args
`config`	A Python dictionary, typically the output of get_config.

Returns
A layer instance.

`symbolic_call`