class CrossShardOptimizer: An optimizer that averages gradients across TPU shards.
computation along the batch dimension for parallel execution.
bfloat16_scope(...): Scope class for bfloat16 variables so that the model uses custom getter.
core(...): Returns the device name for a core in a replicated TPU computation.
cross_replica_sum(...): Sum the input tensor across replicas according to group_assignment.
initialize_system(...): Initializes a distributed TPU system for use with TensorFlow.
outside_compilation(...): Builds part of a computation outside any current TPU replicate scope.
replicate(...): Builds a graph operator that runs a replicated TPU computation.
computation for execution on a TPU system.
computation for parallel execution.
shutdown_system(...): Shuts down a running a distributed TPU system.