Gated linear feedforward layer.
tfm.nlp.layers.GatedFeedforward(
intermediate_size,
intermediate_activation,
dropout,
use_gate=True,
apply_output_layer_norm=True,
num_blocks=1,
dropout_position='before_residual',
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs
)
This layer follows the paper "GLU Variants Improve Transformer"
(https://arxiv.org/abs/2002.05202). In additional, it allows to stack
multiple feedforward blocks and specify the position of dropout layer.
Args |
intermediate_size
|
Size of the intermediate layer.
|
intermediate_activation
|
Activation for the intermediate layer.
|
dropout
|
Dropout probability for the output dropout.
|
use_gate
|
Whether to use gated linear units. If True, assuming GELU as the
activation and omitting bias, will apply
GEGLU(x, W, V, W_2) = (GEGLU(xW) * xV)W2 ; if False, will follow
"Attention Is All You Need" (https://arxiv.org/abs/1706.03762) paper and
apply FFN(x, W, W_2) = GELU(xW_1)W_2.
|
num_blocks
|
The number of feedforward blocks to stack. Each block contains a
(gated) linear layer and a fully connected layer followed by dropout,
layer norm and residual.
|
dropout_position
|
Where to apply the dropout, the value can be either
before_residual or after_residual . If before_residual , will apply
layer_output = layer_norm(dropout(layer_output) + layer_input) ; if
after residual , will apply
layer_output = dropout(layer_norm(layer_output + layer_input)) .
|
kernel_initializer
|
Initializer for dense layer kernels.
|
bias_initializer
|
Initializer for dense layer biases.
|
kernel_regularizer
|
Regularizer for dense layer kernels.
|
bias_regularizer
|
Regularizer for dense layer biases.
|
activity_regularizer
|
Regularizer for dense layer activity.
|
kernel_constraint
|
Constraint for dense layer kernels.
|
bias_constraint
|
Constraint for dense layer kernels.
|
Methods
call
View source
call(
inputs
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first invocation,
wrapping the creation of variables or other resources in tf.init_scope()
).
It is recommended to create state in __init__()
, or the build()
method
that is called automatically before call()
executes the first time.
Args |
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
inputs must be explicitly passed. A layer cannot have zero
arguments, and inputs cannot be provided via the default value
of a keyword argument.
- NumPy array or Python scalar values in
inputs get cast as tensors.
- Keras mask metadata is only collected from
inputs .
- Layers are built (
build(input_shape) method)
using shape info from inputs only.
input_spec compatibility is only checked against inputs .
- Mixed precision input casting is only applied to
inputs .
If a layer has tensor arguments in *args or **kwargs , their
casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using
inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite,
TF.js, etc is only supported for
inputs and not for tensors in
positional and keyword arguments.
|
*args
|
Additional positional arguments. May contain tensors, although
this is not recommended, for the reasons above.
|
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.
mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask generated
for inputs by the previous layer (if input did come from a layer
that generated a corresponding mask, i.e. if it came from a Keras
layer with masking support).
|
Returns |
A tensor or list/tuple of tensors.
|