tfm.nlp.models.T5TransformerParams
Stay organized with collections
Save and categorize content based on your preferences.
Transformer parameters.
tfm.nlp.models.T5TransformerParams(
num_layers: int,
d_model: int,
d_kv: int,
num_heads: int,
d_ff: int,
vocab_size: int,
target_vocab_size: Optional[int] = None,
dropout_rate: float = 0.0,
layer_norm_epsilon: float = 1e-06,
shared_embedding: bool = False,
vocab_embeddings_initializer: Optional[Initializer] = None,
relative_attention_num_buckets: int = 32,
relative_attention_max_distance: int = 128,
relative_embeddings_initializer: Optional[Initializer] = None,
weight_initializer: Optional[Initializer] = tfm.nlp.models.T5TransformerParams.weight_initializer
,
bias_initializer: Optional[Initializer] = None,
rescale_query: bool = False,
bidirectional: bool = True,
ffn_activations: Sequence[str] = tfm.nlp.models.T5TransformerParams.ffn_activations
,
logits_via_embedding: bool = True,
num_decoder_layers: Optional[int] = None,
one_hot_embedding: bool = True,
layer_sharing: bool = False,
use_shared_relative_position_bias: bool = True,
return_attention_scores: bool = False
)
Attributes |
num_layers
|
Dataclass field
|
d_model
|
Dataclass field
|
d_kv
|
Dataclass field
|
num_heads
|
Dataclass field
|
d_ff
|
Dataclass field
|
vocab_size
|
Dataclass field
|
target_vocab_size
|
Dataclass field
|
dropout_rate
|
Dataclass field
|
layer_norm_epsilon
|
Dataclass field
|
shared_embedding
|
Dataclass field
|
vocab_embeddings_initializer
|
Dataclass field
|
relative_attention_num_buckets
|
Dataclass field
|
relative_attention_max_distance
|
Dataclass field
|
relative_embeddings_initializer
|
Dataclass field
|
weight_initializer
|
Dataclass field
|
bias_initializer
|
Dataclass field
|
rescale_query
|
Dataclass field
|
bidirectional
|
Dataclass field
|
ffn_activations
|
Dataclass field
|
logits_via_embedding
|
Dataclass field
|
num_decoder_layers
|
Dataclass field
|
one_hot_embedding
|
Dataclass field
|
layer_sharing
|
Dataclass field
|
use_shared_relative_position_bias
|
Dataclass field
|
return_attention_scores
|
Dataclass field
|
Methods
weight_initializer
weight_initializer(
dtype=None, **kwargs
)
He normal initializer.
Also available via the shortcut function
tf.keras.initializers.he_normal
.
It draws samples from a truncated normal distribution centered on 0 with
stddev = sqrt(2 / fan_in)
where fan_in
is the number of input units in
the weight tensor.
Examples:
# Standalone usage:
initializer = tf.keras.initializers.HeNormal()
values = initializer(shape=(2, 2))
# Usage in a Keras layer:
initializer = tf.keras.initializers.HeNormal()
layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)
Args |
seed
|
A Python integer. Used to make the behavior of the initializer
deterministic. Note that a seeded initializer will not produce the same
random values across multiple calls, but multiple initializers will
produce the same sequence when constructed with the same seed value.
|
__eq__
__eq__(
other
)
Class Variables |
bias_initializer
|
None
|
bidirectional
|
True
|
dropout_rate
|
0.0
|
ffn_activations
|
('relu',)
|
layer_norm_epsilon
|
1e-06
|
layer_sharing
|
False
|
logits_via_embedding
|
True
|
num_decoder_layers
|
None
|
one_hot_embedding
|
True
|
relative_attention_max_distance
|
128
|
relative_attention_num_buckets
|
32
|
relative_embeddings_initializer
|
None
|
rescale_query
|
False
|
return_attention_scores
|
False
|
shared_embedding
|
False
|
target_vocab_size
|
None
|
use_shared_relative_position_bias
|
True
|
vocab_embeddings_initializer
|
None
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2024-02-02 UTC.
[null,null,["Last updated 2024-02-02 UTC."],[],[],null,["# tfm.nlp.models.T5TransformerParams\n\n\u003cbr /\u003e\n\n|---------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/models/blob/v2.15.0/official/nlp/modeling/models/t5.py#L1005-L1034) |\n\nTransformer parameters. \n\n tfm.nlp.models.T5TransformerParams(\n num_layers: int,\n d_model: int,\n d_kv: int,\n num_heads: int,\n d_ff: int,\n vocab_size: int,\n target_vocab_size: Optional[int] = None,\n dropout_rate: float = 0.0,\n layer_norm_epsilon: float = 1e-06,\n shared_embedding: bool = False,\n vocab_embeddings_initializer: Optional[Initializer] = None,\n relative_attention_num_buckets: int = 32,\n relative_attention_max_distance: int = 128,\n relative_embeddings_initializer: Optional[Initializer] = None,\n weight_initializer: Optional[Initializer] = ../../../tfm/nlp/models/T5TransformerParams#weight_initializer,\n bias_initializer: Optional[Initializer] = None,\n rescale_query: bool = False,\n bidirectional: bool = True,\n ffn_activations: Sequence[str] = ../../../tfm/nlp/models/T5TransformerParams#ffn_activations,\n logits_via_embedding: bool = True,\n num_decoder_layers: Optional[int] = None,\n one_hot_embedding: bool = True,\n layer_sharing: bool = False,\n use_shared_relative_position_bias: bool = True,\n return_attention_scores: bool = False\n )\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|-------------------------------------|-----------------|\n| `num_layers` | Dataclass field |\n| `d_model` | Dataclass field |\n| `d_kv` | Dataclass field |\n| `num_heads` | Dataclass field |\n| `d_ff` | Dataclass field |\n| `vocab_size` | Dataclass field |\n| `target_vocab_size` | Dataclass field |\n| `dropout_rate` | Dataclass field |\n| `layer_norm_epsilon` | Dataclass field |\n| `shared_embedding` | Dataclass field |\n| `vocab_embeddings_initializer` | Dataclass field |\n| `relative_attention_num_buckets` | Dataclass field |\n| `relative_attention_max_distance` | Dataclass field |\n| `relative_embeddings_initializer` | Dataclass field |\n| `weight_initializer` | Dataclass field |\n| `bias_initializer` | Dataclass field |\n| `rescale_query` | Dataclass field |\n| `bidirectional` | Dataclass field |\n| `ffn_activations` | Dataclass field |\n| `logits_via_embedding` | Dataclass field |\n| `num_decoder_layers` | Dataclass field |\n| `one_hot_embedding` | Dataclass field |\n| `layer_sharing` | Dataclass field |\n| `use_shared_relative_position_bias` | Dataclass field |\n| `return_attention_scores` | Dataclass field |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `weight_initializer`\n\n weight_initializer(\n dtype=None, **kwargs\n )\n\nHe normal initializer.\n\nAlso available via the shortcut function\n[`tf.keras.initializers.he_normal`](https://www.tensorflow.org/api_docs/python/tf/keras/initializers/HeNormal).\n\nIt draws samples from a truncated normal distribution centered on 0 with\n`stddev = sqrt(2 / fan_in)` where `fan_in` is the number of input units in\nthe weight tensor.\n\n#### Examples:\n\n # Standalone usage:\n initializer = tf.keras.initializers.HeNormal()\n values = initializer(shape=(2, 2))\n\n # Usage in a Keras layer:\n initializer = tf.keras.initializers.HeNormal()\n layer = tf.keras.layers.Dense(3, kernel_initializer=initializer)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ||\n|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `seed` | A Python integer. Used to make the behavior of the initializer deterministic. Note that a seeded initializer will not produce the same random values across multiple calls, but multiple initializers will produce the same sequence when constructed with the same seed value. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| References ||\n|---|---|\n| \u003cbr /\u003e - [He et al., 2015](https://arxiv.org/abs/1502.01852) ||\n\n\u003cbr /\u003e\n\n### `__eq__`\n\n __eq__(\n other\n )\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Class Variables --------------- ||\n|-----------------------------------|-------------|\n| bias_initializer | `None` |\n| bidirectional | `True` |\n| dropout_rate | `0.0` |\n| ffn_activations | `('relu',)` |\n| layer_norm_epsilon | `1e-06` |\n| layer_sharing | `False` |\n| logits_via_embedding | `True` |\n| num_decoder_layers | `None` |\n| one_hot_embedding | `True` |\n| relative_attention_max_distance | `128` |\n| relative_attention_num_buckets | `32` |\n| relative_embeddings_initializer | `None` |\n| rescale_query | `False` |\n| return_attention_scores | `False` |\n| shared_embedding | `False` |\n| target_vocab_size | `None` |\n| use_shared_relative_position_bias | `True` |\n| vocab_embeddings_initializer | `None` |\n\n\u003cbr /\u003e"]]