TensorFlow is back at Google I/O on May 14! Register now

tf.tpu.experimental.embedding.TPUEmbedding

The TPUEmbedding mid level API.

View aliases

Compat aliases for migration

tf.compat.v1.tpu.experimental.embedding.TPUEmbedding

tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config: Union[tf.tpu.experimental.embedding.FeatureConfig, Iterable],
    optimizer: Optional[tpu_embedding_v2_utils._Optimizer],
    pipeline_execution_with_tensor_core: bool = False
)

This class can be used to support training large embeddings on TPU. When creating an instance of this class, you must specify the complete set of tables and features you expect to lookup in those tables. See the documentation of tf.tpu.experimental.embedding.TableConfig and tf.tpu.experimental.embedding.FeatureConfig for more details on the complete set of options. We will cover the basic usage here.

table_config_one = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)
table_config_two = tf.tpu.experimental.embedding.TableConfig(
    vocabulary_size=...,
    dim=...)
feature_config = {
    'feature_one': tf.tpu.experimental.embedding.FeatureConfig(
        table=table_config_one),
    'feature_two': tf.tpu.experimental.embedding.FeatureConfig(
        table=table_config_one),
    'feature_three': tf.tpu.experimental.embedding.FeatureConfig(
        table=table_config_two)}

There are two modes under which the TPUEmbedding class can used. This depends on if the class was created under a TPUStrategy scope or not.

Under TPUStrategy, we allow access to the method enqueue, dequeue and apply_gradients. We will show examples below of how to use these to train and evaluate your model. Under CPU, we only access to the embedding_tables property which allow access to the embedding tables so that you can use them to run model evaluation/prediction on CPU.

First lets look at the TPUStrategy mode. Initial setup looks like:

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
  embedding = tf.tpu.experimental.embedding.TPUEmbedding(
      feature_config=feature_config,
      optimizer=tf.tpu.experimental.embedding.SGD(0.1))

When creating a distributed dataset that is to be passed to the enqueue operation a special input option must be specified:

distributed_dataset = (
    strategy.distribute_datasets_from_function(
        dataset_fn=...,
        options=tf.distribute.InputOptions(
            experimental_fetch_to_device=False))
dataset_iterator = iter(distributed_dataset)

To use this API on TPU you should use a custom training loop. Below is an example of a training and evaluation step:

@tf.function
def training_step(dataset_iterator, num_steps):
  def tpu_step(tpu_features):
    with tf.GradientTape() as tape:
      activations = embedding.dequeue()
      tape.watch(activations)
      model_output = model(activations)
      loss = ...  # some function of labels and model_output

    embedding_gradients = tape.gradient(loss, activations)
    embedding.apply_gradients(embedding_gradients)
    # Insert your model gradient and optimizer application here

  for _ in tf.range(num_steps):
    embedding_features, tpu_features = next(dataset_iterator)
    embedding.enqueue(embedding_features, training=True)
    strategy.run(tpu_step, args=(embedding_features, ))

@tf.function
def evalution_step(dataset_iterator, num_steps):
  def tpu_step(tpu_features):
    activations = embedding.dequeue()
    model_output = model(activations)
    # Insert your evaluation code here.

  for _ in tf.range(num_steps):
    embedding_features, tpu_features = next(dataset_iterator)
    embedding.enqueue(embedding_features, training=False)
    strategy.run(tpu_step, args=(embedding_features, ))

In the above examples, we assume that the user has a dataset which returns a tuple where the first element of the tuple matches the structure of what was passed as the feature_config argument to the object initializer. Also we utilize tf.range to get a tf.while_loop in order to increase performance.

When checkpointing your model, you should include your tf.tpu.experimental.embedding.TPUEmbedding object in the checkpoint. It is a trackable object and saving it will save the embedding tables and their optimizer slot variables:

checkpoint = tf.train.Checkpoint(model=model, embedding=embedding)
checkpoint.save(...)

On CPU, only the embedding_table property is usable. This will allow you to restore a checkpoint to the object and have access to the table variables:

model = model_fn(...)
embedding = tf.tpu.experimental.embedding.TPUEmbedding(
    feature_config=feature_config,
    optimizer=tf.tpu.experimental.embedding.SGD(0.1))
checkpoint = tf.train.Checkpoint(model=model, embedding=embedding)
checkpoint.restore(...)

tables = embedding.embedding_tables

You can now use table in functions like tf.nn.embedding_lookup to perform your embedding lookup and pass to your model.

Args
`feature_config`	A nested structure of `tf.tpu.experimental.embedding.FeatureConfig` configs.
`optimizer`	An instance of one of `tf.tpu.experimental.embedding.SGD`, `tf.tpu.experimental.embedding.Adagrad` or `tf.tpu.experimental.embedding.Adam`. When not created under TPUStrategy may be set to None to avoid the creation of the optimizer slot variables, useful for optimizing memory consumption when exporting the model for serving where slot variables aren't needed.
`pipeline_execution_with_tensor_core`	If True, the TPU embedding computations will overlap with the TensorCore computations (and hence will be one step old). Set to True for improved performance.

Raises
`ValueError`	If optimizer is not one of tf.tpu.experimental.embedding.(SGD, Adam or Adagrad) or None when created under a TPUStrategy.

Attributes
`embedding_tables`	Returns a dict of embedding tables, keyed by `TableConfig`. This property only works when the `TPUEmbedding` object is created under a non-TPU strategy. This is intended to be used to for CPU based lookup when creating a serving checkpoint.

Attributes

embedding_tables

Returns a dict of embedding tables, keyed by TableConfig.

This property only works when the TPUEmbedding object is created under a non-TPU strategy. This is intended to be used to for CPU based lookup when creating a serving checkpoint.

Methods

`apply_gradients`

View source

apply_gradients(
    gradients,
    name: Text = None
)

Applies the gradient update to the embedding tables.

If a gradient of None is passed in any position of the nested structure, then an gradient update with a zero gradient is applied for that feature. For optimizers like SGD or Adagrad, this is the same as applying no update at all. For lazy Adam and other sparsely applied optimizers with decay, ensure you understand the effect of applying a zero gradient.

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
  embedding = tf.tpu.experimental.embedding.TPUEmbedding(...)

distributed_dataset = (
    strategy.distribute_datasets_from_function(
        dataset_fn=...,
        options=tf.distribute.InputOptions(
            experimental_fetch_to_device=False))
dataset_iterator = iter(distributed_dataset)

@tf.function
def training_step():
  def tpu_step(tpu_features):
    with tf.GradientTape() as tape:
      activations = embedding.dequeue()
      tape.watch(activations)

      loss = ... #  some computation involving activations

    embedding_gradients = tape.gradient(loss, activations)
    embedding.apply_gradients(embedding_gradients)

  embedding_features, tpu_features = next(dataset_iterator)
  embedding.enqueue(embedding_features, training=True)
  strategy.run(tpu_step, args=(embedding_features, ))

training_step()

Args
`gradients`	A nested structure of gradients, with structure matching the `feature_config` passed to this object.
`name`	A name for the underlying op.

Raises
`RuntimeError`	If called when object wasn't created under a `TPUStrategy` or if not built (either by manually calling build or calling enqueue).
`ValueError`	If a non-`tf.Tensor` non-`None` gradient is passed in, or a `tf.Tensor` of the incorrect shape is passed in. Also if the size of any sequence in `gradients` does not match corresponding sequence in `feature_config`.
`TypeError`	If the type of any sequence in `gradients` does not match corresponding sequence in `feature_config`.

`build`

View source

build(
    per_replica_batch_size: Optional[int] = None
)

Create the underlying variables and initializes the TPU for embeddings.

This method creates the underlying variables (including slot variables). If created under a TPUStrategy, this will also initialize the TPU for embeddings.

This function will automatically get called by enqueue, which will try to determine your batch size automatically. If this fails, you must manually call this method before you call enqueue.

Args
`per_replica_batch_size`	The per replica batch size that you intend to use. Note that is fixed and the same batch size must be used for both training and evaluation. If you want to calculate this from the global batch size, you can use `num_replicas_in_sync` property of your strategy object. May be set to None if not created under a TPUStrategy.

Raises
`ValueError`	If per_replica_batch_size is None and object was created in a TPUStrategy scope.

`dequeue`

View source

dequeue(
    name: Text = None
)

Get the embedding results.

Returns a nested structure of tf.Tensor objects, matching the structure of the feature_config argument to the TPUEmbedding class. The output shape of the tensors is (batch_size, dim), where batch_size is the per core batch size, dim is the dimension of the corresponding TableConfig. If the feature's corresponding FeatureConfig has max_sequence_length greater than 0, the output will be a sequence of shape (batch_size, max_sequence_length, dim) instead.

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
  embedding = tf.tpu.experimental.embedding.TPUEmbedding(...)

distributed_dataset = (
    strategy.distribute_datasets_from_function(
        dataset_fn=...,
        options=tf.distribute.InputOptions(
            experimental_fetch_to_device=False))
dataset_iterator = iter(distributed_dataset)

@tf.function
def training_step():
  def tpu_step(tpu_features):
    with tf.GradientTape() as tape:
      activations = embedding.dequeue()
      tape.watch(activations)

      loss = ... #  some computation involving activations

    embedding_gradients = tape.gradient(loss, activations)
    embedding.apply_gradients(embedding_gradients)

  embedding_features, tpu_features = next(dataset_iterator)
  embedding.enqueue(embedding_features, training=True)
  strategy.run(tpu_step, args=(embedding_features, ))

training_step()

Args
`name`	A name for the underlying op.

Returns
A nested structure of tensors, with the same structure as `feature_config`

passed to this instance of the TPUEmbedding object.

Raises
`RuntimeError`	If called when object wasn't created under a `TPUStrategy` or if not built (either by manually calling build or calling enqueue).

`enqueue`

View source

enqueue(
    features,
    weights=None,
    training: bool = True,
    name: Optional[Text] = None,
    device: Optional[Text] = None
)

Enqueues id tensors for embedding lookup.

This function enqueues a structure of features to be looked up in the embedding tables. We expect that the batch size of each of the tensors in features matches the per core batch size. This will automatically happen if your input dataset is batched to the global batch size and you use tf.distribute.TPUStrategy's experimental_distribute_dataset or if you use distribute_datasets_from_function and batch to the per core batch size computed by the context passed to your input function.

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
  embedding = tf.tpu.experimental.embedding.TPUEmbedding(...)

distributed_dataset = (
    strategy.distribute_datasets_from_function(
        dataset_fn=...,
        options=tf.distribute.InputOptions(
            experimental_fetch_to_device=False))
dataset_iterator = iter(distributed_dataset)

@tf.function
def training_step():
  def tpu_step(tpu_features):
    with tf.GradientTape() as tape:
      activations = embedding.dequeue()
      tape.watch(activations)

      loss = ... #  some computation involving activations

    embedding_gradients = tape.gradient(loss, activations)
    embedding.apply_gradients(embedding_gradients)

  embedding_features, tpu_features = next(dataset_iterator)
  embedding.enqueue(embedding_features, training=True)
  strategy.run(tpu_step, args=(embedding_features,))

training_step()

For finer grained control, in the above example the line

  embedding.enqueue(embedding_features, training=True)

may be replaced with

  per_core_embedding_features = self.strategy.experimental_local_results(
      embedding_features)

  def per_core_enqueue(ctx):
    core_id = ctx.replica_id_in_sync_group
    device = strategy.extended.worker_devices[core_id]
    embedding.enqueue(per_core_embedding_features[core_id],
                      device=device)

  strategy.experimental_distribute_values_from_function(
      per_core_queue_inputs)

Args
`features`	A nested structure of `tf.Tensor`s, `tf.SparseTensor`s or `tf.RaggedTensor`s, with the same structure as `feature_config`. Inputs will be downcast to `tf.int32`. Only one type out of `tf.SparseTensor` or `tf.RaggedTensor` is supported per call.
`weights`	If not `None`, a nested structure of `tf.Tensor`s, `tf.SparseTensor`s or `tf.RaggedTensor`s, matching the above, except that the tensors should be of float type (and they will be downcast to `tf.float32`). For `tf.SparseTensor`s we assume the `indices` are the same for the parallel entries from `features` and similarly for `tf.RaggedTensor`s we assume the row_splits are the same.
`training`	Defaults to `True`. If `False`, enqueue the batch as inference batch (forward pass only). Do not call `apply_gradients` when this is `False` as this may lead to a deadlock. name: A name for the underlying op. device: The device name (e.g. '/task:0/device:TPU:2') where this batch should be enqueued. This should be set if and only if features is not a `tf.distribute.DistributedValues` and enqueue is not being called inside a TPU context (e.g. inside `TPUStrategy.run`).

Raises
`ValueError`	When called inside a strategy.run call and input is not directly taken from the args of the `strategy.run` call. Also if the size of any sequence in `features` does not match corresponding sequence in `feature_config`. Similarly for `weights`, if not `None`. If batch size of features is unequal or different from a previous call.
`RuntimeError`	When called inside a strategy.run call and inside XLA control flow. If batch_size is not able to be determined and build was not called.
`TypeError`	If the type of any sequence in `features` does not match corresponding sequence in `feature_config`. Similarly for `weights`, if not `None`.

tf.tpu.experimental.embedding.TPUEmbedding

View aliases

Args

Raises

Attributes

Methods

apply_gradients

build

dequeue

enqueue

`apply_gradients`

`build`

`dequeue`

`enqueue`