Thanks for tuning in to Google I/O. View all sessions on demandWatch on demand


An embedding network supporting packed sequences and position ids.

This network implements an embedding layer similar to the one described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" ( On top of it, it supports to (1) pack multiple sequences into one sequence and (2) allow additional "position_ids" as input.

vocab_size The size of the token vocabulary.
type_vocab_size The size of the type vocabulary.
embedding_width Width of token embeddings.
hidden_size The output size for this encoder.
max_seq_length The maximum sequence length for this encoder.
initializer The initializer for the embedding portion of this encoder.
dropout_rate The dropout rate to apply before the encoding layers.
pack_multiple_sequences If True, we can feed multiple sequences into one sequence for training and inference (they don't impact each other).
use_position_id Whether to expect position_ids as an input to the network. If False, the position_ids will be inferred: (1) when pack_multiple_sequences is False, we assume the position ids are 0, 1, 2, ..., seq_length - 1; (2) when pack_multiple_sequences is True, there may be multiple sub sequences, and for each sub sequence, its position ids start from 0, 1, 2, ...



Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs Input tensor, or dict/list/tuple of input tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide here.

A tensor if there is a single output, or a list of tensors if there are more than one outputs.


View source