![]() |
FNet encoder network.
tfm.nlp.networks.FNet(
vocab_size: int,
hidden_size: int = 768,
num_layers: int = 12,
mixing_mechanism: tfm.nlp.layers.MixingMechanism
= tfm.nlp.layers.MixingMechanism.FOURIER
,
use_fft: bool = False,
attention_layers: Sequence[int] = (),
num_attention_heads: int = 12,
max_sequence_length: int = 512,
type_vocab_size: int = 16,
inner_dim: int = 3072,
inner_activation: _Activation = _approx_gelu,
output_dropout: float = 0.1,
attention_dropout: float = 0.1,
initializer: _Initializer = tf.keras.initializers.TruncatedNormal(stddev=0.02),
output_range: Optional[int] = None,
embedding_width: Optional[int] = None,
embedding_layer: Optional[tf.keras.layers.Layer] = None,
norm_first: bool = False,
with_dense_inputs: bool = False,
**kwargs
)
Based on "FNet: Mixing Tokens with Fourier Transforms". FNet is an efficient Transformer-like encoder network that replaces self-attention sublayers with Fourier sublayers.
This implementation defaults to the canonical FNet Base model, but the network also supports more general mixing models (e.g. 'Linear', 'HNet') and hybrid models (e.g. 'FNet-Hybrid') models that use both mixing and self-attention layers. The input length is fixed to 'max_sequence_length'.
Args | |
---|---|
vocab_size
|
The size of the token vocabulary. |
hidden_size
|
The size of the transformer hidden layers. |
num_layers
|
The number of transformer layers. |
mixing_mechanism
|
Type of mixing mechanism used in place of self-attention layers. Defaults to FNet ('Fourier') mixing. |
use_fft
|
Only used for spectral mixing mechanisms. Determines whether to use Fast Fourier Transform (True) or the Discrete Fourier Transform (DFT) matrix (False; default) to compute the Fourier Transform. See layers.FourierTransformLayer or layers.HartleyTransformLayer for advice. |
attention_layers
|
Specifies which layers, if any, should be attention layers
in the encoder. The remaining [0, num_layers) setminus attention_layers
will use the specified mixing_mechanism . If using attention layers, a
good rule of thumb is to place them in the final few layers.
|
num_attention_heads
|
The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads. |
max_sequence_length
|
The only sequence length that this encoder can consume. This determines the variable shape for positional embeddings and the size of the mixing matrices. |
type_vocab_size
|
The number of types that the 'type_ids' input can take. |
inner_dim
|
The output dimension of the first Dense layer in a two-layer feedforward network for each transformer. |
inner_activation
|
The activation for the first Dense layer in a two-layer feedforward network for each transformer. |
output_dropout
|
Dropout probability for the post-attention and output dropout. |
attention_dropout
|
The dropout rate to use for the attention layers within the transformer layers. |
initializer
|
The initializer to use for all weights in this encoder. |
output_range
|
The sequence output range, [0, output_range), by slicing the
target sequence of the last transformer layer. None means the entire
target sequence will attend to the source sequence, which yields the full
output.
|
embedding_width
|
The width of the word embeddings. If the embedding width is not equal to hidden size, embedding parameters will be factorized into two matrices in the shape of ['vocab_size', 'embedding_width'] and 'embedding_width', 'hidden_size'. |
embedding_layer
|
An optional Layer instance which will be called to generate embeddings for the input word IDs. |
norm_first
|
Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized. |
with_dense_inputs
|
Whether to accept dense embeddings as the input. |
Attributes | |
---|---|
pooler_layer
|
The pooler dense layer after the transformer layers. |
transformer_layers
|
List of Transformer layers in the encoder. |
Methods
call
call(
inputs
)
This is where the layer's logic lives.
The call()
method may not create state (except in its first
invocation, wrapping the creation of variables or other resources in
tf.init_scope()
). It is recommended to create state, including
tf.Variable
instances and nested Layer
instances,
in __init__()
, or in the build()
method that is
called automatically before call()
executes for the first time.
Args | |
---|---|
inputs
|
Input tensor, or dict/list/tuple of input tensors.
The first positional inputs argument is subject to special rules:
|
*args
|
Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above. |
**kwargs
|
Additional keyword arguments. May contain tensors, although
this is not recommended, for the reasons above.
The following optional keyword arguments are reserved:
training : Boolean scalar tensor of Python boolean indicating
whether the call is meant for training or inference.mask : Boolean input mask. If the layer's call() method takes a
mask argument, its default value will be set to the mask
generated for inputs by the previous layer (if input did come
from a layer that generated a corresponding mask, i.e. if it came
from a Keras layer with masking support).
|
Returns | |
---|---|
A tensor or list/tuple of tensors. |
get_embedding_layer
get_embedding_layer()
get_embedding_table
get_embedding_table()