tf.distribute.NcclAllReduce

NCCL all-reduce implementation of CrossDeviceOps.

Inherits From: CrossDeviceOps

It uses Nvidia NCCL for all-reduce. For the batch API, tensors will be repacked or aggregated for more efficient cross-device transportation.

For reduces that are not all-reduce, it falls back to tf.distribute.ReductionToOneDevice.

Here is how you can use NcclAllReduce in tf.distribute.MirroredStrategy:

  strategy = tf.distribute.MirroredStrategy(
    cross_device_ops=tf.distribute.NcclAllReduce())

num_packs a non-negative integer. The number of packs to split values into. If zero, no packing will be done.

ValueError if num_packs is negative.

Methods

batch_reduce

View source

Reduce values to destinations in batches.

See tf.distribute.StrategyExtended.batch_reduce_to. This can only be called in the cross-replica context.

Args
reduce_op a tf.distribute.ReduceOp specifying how values should be combined.
value_destination_pairs a sequence of (value, destinations) pairs. See tf.distribute.CrossDeviceOps.reduce for descriptions.
options a tf.distribute.experimental.CommunicationOptions. See tf.distribute.experimental.CommunicationOptions for details.

Returns
A list of tf.Tensor or tf.distribute.DistributedValues, one per pair in value_destination_pairs.

Raises
ValueError if value_destination_pairs is not an iterable of tuples of tf.distribute.DistributedValues and destinations.

broadcast

View source

Broadcast tensor to destinations.

This can only be called in the cross-replica context.

Args
tensor a tf.Tensor like object. The value to broadcast.
destinations a tf.distribute.DistributedValues, a tf.Variable, a tf.Tensor alike object, or a device string. It specifies the devices to broadcast to. Note that if it's a tf.Variable, the value is broadcasted to the devices of that variable, this method doesn't update the variable.

Returns
A tf.Tensor or tf.distribute.DistributedValues.

reduce

View source

Reduce per_replica_value to destinations.

See tf.distribute.StrategyExtended.reduce_to. This can only be called in the cross-replica context.

Args
reduce_op a tf.distribute.ReduceOp specifying how values should be combined.
per_replica_value a tf.distribute.DistributedValues, or a tf.Tensor like object.
destinations a tf.distribute.DistributedValues, a tf.Variable, a tf.Tensor alike object, or a device string. It specifies the devices to reduce to. To perform an all-reduce, pass the same to value and destinations. Note that if it's a tf.Variable, the value is reduced to the devices of that variable, and this method doesn't update the variable.
options a tf.distribute.experimental.CommunicationOptions. See tf.distribute.experimental.CommunicationOptions for details.

Returns
A tf.Tensor or tf.distribute.DistributedValues.

Raises
ValueError if per_replica_value can't be converted to a tf.distribute.DistributedValues or if destinations is not a string, tf.Variable or tf.distribute.DistributedValues.