tf.keras.utils.warmstart_embedding_matrix
Stay organized with collections
Save and categorize content based on your preferences.
Warm start embedding matrix with changing vocab.
View aliases
Compat aliases for migration
See
Migration guide for
more details.
`tf.compat.v1.keras.utils.warmstart_embedding_matrix`
tf.keras.utils.warmstart_embedding_matrix(
base_vocabulary,
new_vocabulary,
base_embeddings,
new_embeddings_initializer='uniform'
)
This util can be used to warmstart the embedding layer matrix when
vocabulary changes between previously saved checkpoint and model.
Vocabulary change could mean, the size of the new vocab is different or the
vocabulary is reshuffled or new vocabulary has been added to old vocabulary.
If the vocabulary size changes, size of the embedding layer matrix also
changes. This util remaps the old vocabulary embeddings to the new embedding
layer matrix.
Example:
Here is an example that demonstrates how to use the
warmstart_embedding_matrix
util.
>>> import keras
>>> vocab_base = tf.convert_to_tensor(["unk", "a", "b", "c"])
>>> vocab_new = tf.convert_to_tensor(
... ["unk", "unk", "a", "b", "c", "d", "e"])
>>> vectorized_vocab_base = np.random.rand(vocab_base.shape[0], 3)
>>> vectorized_vocab_new = np.random.rand(vocab_new.shape[0], 3)
>>> warmstarted_embedding_matrix = warmstart_embedding_matrix(
... base_vocabulary=vocab_base,
... new_vocabulary=vocab_new,
... base_embeddings=vectorized_vocab_base,
... new_embeddings_initializer=keras.initializers.Constant(
... vectorized_vocab_new))
Here is an example that demonstrates how to get vocabulary and embedding
weights from layers, use the warmstart_embedding_matrix
util to remap the
layer embeddings and continue with model training.
# get old and new vocabulary by using layer.get_vocabulary()
# for example assume TextVectorization layer is used
base_vocabulary = old_text_vectorization_layer.get_vocabulary()
new_vocabulary = new_text_vectorization_layer.get_vocabulary()
# get previous embedding layer weights
embedding_weights_base = model.get_layer('embedding').get_weights()[0]
warmstarted_embedding = keras.utils.warmstart_embedding_matrix(
base_vocabulary,
new_vocabulary,
base_embeddings=embedding_weights_base,
new_embeddings_initializer="uniform")
updated_embedding_variable = tf.Variable(warmstarted_embedding)
# update embedding layer weights
model.layers[1].embeddings = updated_embedding_variable
model.fit(..)
# continue with model training
Args |
base_vocabulary
|
The list of vocabulary terms that
the preexisting embedding matrix base_embeddings represents.
It can be either a 1D array/tensor or a tuple/list of vocabulary
terms (strings), or a path to a vocabulary text file. If passing a
file path, the file should contain one line per term in the
vocabulary.
|
new_vocabulary
|
The list of vocabulary terms for the new vocabulary
(same format as above).
|
base_embeddings
|
NumPy array or tensor representing the preexisting
embedding matrix.
|
new_embeddings_initializer
|
Initializer for embedding vectors for
previously unseen terms to be added to the new embedding matrix (see
keras.initializers ). Defaults to "uniform". new_embedding matrix
needs to be specified with "constant" initializer.
matrix. Default value is None.
|
Returns |
tf.tensor of remapped embedding layer matrix
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2023-10-06 UTC.
[null,null,["Last updated 2023-10-06 UTC."],[],[],null,["# tf.keras.utils.warmstart_embedding_matrix\n\n\u003cbr /\u003e\n\n|----------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/keras-team/keras/tree/v2.11.0/keras/utils/layer_utils.py#L815-L924) |\n\nWarm start embedding matrix with changing vocab.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n\\`tf.compat.v1.keras.utils.warmstart_embedding_matrix\\`\n\n\u003cbr /\u003e\n\n tf.keras.utils.warmstart_embedding_matrix(\n base_vocabulary,\n new_vocabulary,\n base_embeddings,\n new_embeddings_initializer='uniform'\n )\n\nThis util can be used to warmstart the embedding layer matrix when\nvocabulary changes between previously saved checkpoint and model.\nVocabulary change could mean, the size of the new vocab is different or the\nvocabulary is reshuffled or new vocabulary has been added to old vocabulary.\nIf the vocabulary size changes, size of the embedding layer matrix also\nchanges. This util remaps the old vocabulary embeddings to the new embedding\nlayer matrix.\n\n#### Example:\n\nHere is an example that demonstrates how to use the\n`warmstart_embedding_matrix` util. \n\n \u003e\u003e\u003e import keras\n \u003e\u003e\u003e vocab_base = tf.convert_to_tensor([\"unk\", \"a\", \"b\", \"c\"])\n \u003e\u003e\u003e vocab_new = tf.convert_to_tensor(\n ... [\"unk\", \"unk\", \"a\", \"b\", \"c\", \"d\", \"e\"])\n \u003e\u003e\u003e vectorized_vocab_base = np.random.rand(vocab_base.shape[0], 3)\n \u003e\u003e\u003e vectorized_vocab_new = np.random.rand(vocab_new.shape[0], 3)\n \u003e\u003e\u003e warmstarted_embedding_matrix = warmstart_embedding_matrix(\n ... base_vocabulary=vocab_base,\n ... new_vocabulary=vocab_new,\n ... base_embeddings=vectorized_vocab_base,\n ... new_embeddings_initializer=keras.initializers.Constant(\n ... vectorized_vocab_new))\n\nHere is an example that demonstrates how to get vocabulary and embedding\nweights from layers, use the `warmstart_embedding_matrix` util to remap the\nlayer embeddings and continue with model training. \n\n # get old and new vocabulary by using layer.get_vocabulary()\n # for example assume TextVectorization layer is used\n base_vocabulary = old_text_vectorization_layer.get_vocabulary()\n new_vocabulary = new_text_vectorization_layer.get_vocabulary()\n # get previous embedding layer weights\n embedding_weights_base = model.get_layer('embedding').get_weights()[0]\n warmstarted_embedding = keras.utils.warmstart_embedding_matrix(\n base_vocabulary,\n new_vocabulary,\n base_embeddings=embedding_weights_base,\n new_embeddings_initializer=\"uniform\")\n updated_embedding_variable = tf.Variable(warmstarted_embedding)\n\n # update embedding layer weights\n model.layers[1].embeddings = updated_embedding_variable\n model.fit(..)\n # continue with model training\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `base_vocabulary` | The list of vocabulary terms that the preexisting embedding matrix `base_embeddings` represents. It can be either a 1D array/tensor or a tuple/list of vocabulary terms (strings), or a path to a vocabulary text file. If passing a file path, the file should contain one line per term in the vocabulary. |\n| `new_vocabulary` | The list of vocabulary terms for the new vocabulary (same format as above). |\n| `base_embeddings` | NumPy array or tensor representing the preexisting embedding matrix. |\n| `new_embeddings_initializer` | Initializer for embedding vectors for previously unseen terms to be added to the new embedding matrix (see [`keras.initializers`](../../../tf/keras/initializers)). Defaults to \"uniform\". new_embedding matrix needs to be specified with \"constant\" initializer. matrix. Default value is None. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| tf.tensor of remapped embedding layer matrix ||\n\n\u003cbr /\u003e"]]