TensorFlow 1 version | View source on GitHub |
Connects to the given cluster.
tf.config.experimental_connect_to_cluster(
cluster_spec_or_resolver, job_name='localhost', task_index=0,
protocol=None, make_master_device_default=True, cluster_device_filters=None
)
Will make devices on the cluster available to use. Note that calling this more than once will work, but will invalidate any tensor handles on the old remote devices.
If the given local job name is not present in the cluster specification, it will be automatically added, using an unused port on the localhost.
Device filters can be specified to isolate groups of remote tasks to avoid undesired accesses between workers. Workers accessing resources or launching ops / functions on filtered remote devices will result in errors (unknown devices). For any remote task, if no device filter is present, all cluster devices will be visible; if any device filter is specified, it can only see devices matching at least one filter. Devices on the task itself are always visible. Device filters can be particially specified.
For example, for a cluster set up for parameter server training, the following device filters might be specified:
cdf = tf.config.experimental.ClusterDeviceFilters()
# For any worker, only the devices on PS nodes and itself are visible
for i in range(num_workers):
cdf.set_device_filters('worker', i, ['/job:ps'])
# Similarly for any ps, only the devices on workers and itself are visible
for i in range(num_ps):
cdf.set_device_filters('ps', i, ['/job:worker'])
tf.config.experimental_connect_to_cluster(cluster_def,
cluster_device_filters=cdf)
Args | |
---|---|
cluster_spec_or_resolver
|
A ClusterSpec or ClusterResolver describing
the cluster.
|
job_name
|
The name of the local job. |
task_index
|
The local task index. |
protocol
|
The communication protocol, such as "grpc" . If unspecified, will
use the default from python/platform/remote_utils.py .
|
make_master_device_default
|
If True and a cluster resolver is passed, will automatically enter the master task device scope, which indicates the master becomes the default device to run ops. It won't do anything if a cluster spec is passed. Will throw an error if the caller is currently already in some device scope. |
cluster_device_filters
|
an instance of
tf.train.experimental/ClusterDeviceFilters that specify device filters
to the remote tasks in cluster.
|