tf.config.experimental_connect_to_cluster

Connects to the given cluster.

Used in the notebooks

Used in the guide Used in the tutorials

Will make devices on the cluster available to use. Note that calling this more than once will work, but will invalidate any tensor handles on the old remote devices.

If the given local job name is not present in the cluster specification, it will be automatically added, using an unused port on the localhost.

Device filters can be specified to isolate groups of remote tasks to avoid undesired accesses between workers. Workers accessing resources or launching ops / functions on filtered remote devices will result in errors (unknown devices). For any remote task, if no device filter is present, all cluster devices will be visible; if any device filter is specified, it can only see devices matching at least one filter. Devices on the task itself are always visible. Device filters can be particially specified.

For example, for a cluster set up for parameter server training, the following device filters might be specified:

cdf = tf.config.experimental.ClusterDeviceFilters()
# For any worker, only the devices on PS nodes and itself are visible
for i in range(num_workers):
  cdf.set_device_filters('worker', i, ['/job:ps'])
# Similarly for any ps, only the devices on workers and itself are visible
for i in range(num_ps):
  cdf.set_device_filters('ps', i, ['/job:worker'])

tf.config.experimental_connect_to_cluster(cluster_def,
                                          cluster_device_filters=cdf)

cluster_spec_or_resolver A ClusterSpec or ClusterResolver describing the cluster.
job_name The name of the local job.
task_index The local task index.
protocol The communication protocol, such as "grpc". If unspecified, will use the default from python/platform/remote_utils.py.
make_master_device_default If True and a cluster resolver is passed, will automatically enter the master task device scope, which indicates the master becomes the default device to run ops. It won't do anything if a cluster spec is passed. Will throw an error if the caller is currently already in some device scope.
cluster_device_filters an instance of tf.train.experimental/ClusterDeviceFilters that specify device filters to the remote tasks in cluster.