View source on GitHub |
Transformer parameters.
tfm.nlp.models.T5TransformerParams(
num_layers: int,
d_model: int,
d_kv: int,
num_heads: int,
d_ff: int,
vocab_size: int,
target_vocab_size: Optional[int] = None,
dropout_rate: float = 0.0,
layer_norm_epsilon: float = 1e-06,
shared_embedding: bool = False,
vocab_embeddings_initializer: Optional[Initializer] = None,
relative_attention_num_buckets: int = 32,
relative_attention_max_distance: int = 128,
relative_embeddings_initializer: Optional[Initializer] = None,
weight_initializer: Optional[Initializer] = tfm.nlp.models.T5TransformerParams.weight_initializer
,
bias_initializer: Optional[Initializer] = None,
rescale_query: bool = False,
bidirectional: bool = True,
ffn_activations: Sequence[str] = tfm.nlp.models.T5TransformerParams.ffn_activations
,
logits_via_embedding: bool = True,
num_decoder_layers: Optional[int] = None,
one_hot_embedding: bool = True,
layer_sharing: bool = False,
use_shared_relative_position_bias: bool = True,
return_attention_scores: bool = False
)
Methods
weight_initializer
weight_initializer(
dtype=None, **kwargs
)
He normal initializer.
Also available via the shortcut function
tf.keras.initializers.he_normal
.
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(2 / fan_in)
where fan_in
is the number of input units in
the weight tensor.
Examples:
# Standalone usage:
initializer = tf.keras.initializers.HeNormal()
values = initializer(shape=(2, 2))
# Usage in a Keras layer:
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
Args | |
---|---|
seed
|
A Python integer. Used to make the behavior of the initializer deterministic. Note that a seeded initializer will not produce the same random values across multiple calls, but multiple initializers will produce the same sequence when constructed with the same seed value. |
References | |
---|---|
__eq__
__eq__(
other
)