Thanks for tuning in to Google I/O. View all sessions on demandWatch on demand


A Keras functional API implementation for MobileBERT encoder.

word_vocab_size Number of words in the vocabulary.
word_embed_size Word embedding size.
type_vocab_size Number of word types.
max_sequence_length Maximum length of input sequence.
num_blocks Number of transformer block in the encoder model.
hidden_size Hidden size for the transformer block.
num_attention_heads Number of attention heads in the transformer block.
intermediate_size The size of the "intermediate" (a.k.a., feed forward) layer.
intermediate_act_fn The non-linear activation function to apply to the output of the intermediate/feed-forward layer.
hidden_dropout_prob Dropout probability for the hidden layers.
attention_probs_dropout_prob Dropout probability of the attention probabilities.
intra_bottleneck_size Size of bottleneck.
initializer_range The stddev of the truncated_normal_initializer for initializing all weight matrices.
use_bottleneck_attention Use attention inputs from the bottleneck transformation. If true, the following key_query_shared_bottleneck will be ignored.
key_query_shared_bottleneck Whether to share linear transformation for keys and queries.
num_feedforward_networks Number of stacked feed-forward networks.
normalization_type The type of normalization_type, only no_norm and layer_norm are supported. no_norm represents the element-wise linear transformation for the student model, as suggested by the original MobileBERT paper. layer_norm is used for the teacher model.
classifier_activation If using the tanh activation for the final representation of the [CLS] token in fine-tuning.
input_mask_dtype The dtype of input_mask tensor, which is one of the input tensors of this encoder. Defaults to int32. If you want to use tf.lite quantization, which does not support Cast op, please set this argument to tf.float32 and feed input_mask tensor with values in float32 to avoid tf.cast in the computation.
**kwargs Other keyworded and arguments.

pooler_layer The pooler dense layer after the transformer layers.
transformer_layers List of Transformer layers in the encoder.



Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

inputs Input tensor, or dict/list/tuple of input tensors.
training Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide here.

A tensor if there is a single output, or a list of tensors if there are more than one outputs.


View source


View source