![]() |
Feed-forward layer with multiple experts.
tfm.nlp.layers.FeedForwardExperts(
num_experts: int,
d_ff: int,
*,
inner_dropout: float = 0.0,
output_dropout: float = 0.0,
activation: Callable[[tf.Tensor], tf.Tensor] = tf.keras.activations.gelu,
kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER,
bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER,
name: str = 'experts',
**kwargs
)
Note that call() takes inputs with shape [num_groups, num_experts, expert_capacity, hidden_dim] which is different from the usual [batch_size, seq_len, hidden_dim] used by the FeedForward layer.
The experts are independent FeedForward layers of the same shape, i.e. the kernel doesn't have shape [hidden_dim, out_dim], but [num_experts, hidden_dim, out_dim].
Methods
call
call(
inputs: tf.Tensor, *, training: Optional[bool] = None
) -> tf.Tensor
Applies layer to inputs.
Args | |
---|---|
inputs
|
Inputs of shape
|
training
|
Only apply dropout during training. |
Returns | |
---|---|
Transformed inputs with the same shape as inputs
|