tf.data.experimental.service.from_dataset_id
Stay organized with collections
Save and categorize content based on your preferences.
Creates a dataset which reads data from the tf.data service.
tf.data.experimental.service.from_dataset_id(
processing_mode, service, dataset_id, element_spec=None, job_name=None,
consumer_index=None, num_consumers=None, max_outstanding_requests=None,
data_transfer_protocol=None, target_workers='AUTO'
)
This is useful when the dataset is registered by one process, then used in
another process. When the same process is both registering and reading from
the dataset, it is simpler to use tf.data.experimental.service.distribute
instead.
Before using from_dataset_id
, the dataset must have been registered with the
tf.data service using tf.data.experimental.service.register_dataset
.
register_dataset
returns a dataset id for the registered dataset. That is
the dataset_id
which should be passed to from_dataset_id
.
The element_spec
argument indicates the tf.TypeSpec
s for the elements
produced by the dataset. Currently element_spec
must be explicitly
specified, and match the dataset registered under dataset_id
. element_spec
defaults to None
so that in the future we can support automatically
discovering the element_spec
by querying the tf.data service.
tf.data.experimental.service.distribute
is a convenience method which
combines register_dataset
and from_dataset_id
into a dataset
transformation.
See the documentation for tf.data.experimental.service.distribute
for more
detail about how from_dataset_id
works.
dispatcher = tf.data.experimental.service.DispatchServer()
dispatcher_address = dispatcher.target.split("://")[1]
worker = tf.data.experimental.service.WorkerServer(
tf.data.experimental.service.WorkerConfig(
dispatcher_address=dispatcher_address))
dataset = tf.data.Dataset.range(10)
dataset_id = tf.data.experimental.service.register_dataset(
dispatcher.target, dataset)
dataset = tf.data.experimental.service.from_dataset_id(
processing_mode="parallel_epochs",
service=dispatcher.target,
dataset_id=dataset_id,
element_spec=dataset.element_spec)
print(list(dataset.as_numpy_iterator()))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Args |
processing_mode
|
A string specifying the policy for how data should be
processed by tf.data workers. Can be either "parallel_epochs" to have each
tf.data worker process a copy of the dataset, or "distributed_epoch" to
split a single iteration of the dataset across all the workers.
|
service
|
A string or a tuple indicating how to connect to the tf.data
service. If it's a string, it should be in the format
[<protocol>://]<address> , where <address> identifies the dispatcher
address and <protocol> can optionally be used to override the default
protocol to use. If it's a tuple, it should be (protocol, address).
|
dataset_id
|
The id of the dataset to read from. This id is returned by
register_dataset when the dataset is registered with the tf.data
service.
|
element_spec
|
A nested structure of tf.TypeSpec s representing the type of
elements produced by the dataset. Use tf.data.Dataset.element_spec to
see the element spec for a given dataset.
|
job_name
|
(Optional.) The name of the job. This argument makes it possible
for multiple datasets to share the same job. The default behavior is that
the dataset creates anonymous, exclusively owned jobs.
|
consumer_index
|
(Optional.) The index of the consumer in the range from 0
to num_consumers . Must be specified alongside num_consumers . When
specified, consumers will read from the job in a strict round-robin order,
instead of the default first-come-first-served order.
|
num_consumers
|
(Optional.) The number of consumers which will consume from
the job. Must be specified alongside consumer_index . When specified,
consumers will read from the job in a strict round-robin order, instead of
the default first-come-first-served order. When num_consumers is
specified, the dataset must have infinite cardinality to prevent a
producer from running out of data early and causing consumers to go out of
sync.
|
max_outstanding_requests
|
(Optional.) A limit on how many elements may be
requested at the same time. You can use this option to control the amount
of memory used, since distribute won't use more than element_size *
max_outstanding_requests of memory.
|
data_transfer_protocol
|
(Optional.) The protocol to use for transferring
data with the tf.data service. By default, data is transferred using gRPC.
|
target_workers
|
(Optional.) Which workers to read from. If "AUTO" , tf.data
runtime decides which workers to read from. If "ANY" , reads from any
tf.data service workers. If "LOCAL" , only reads from local in-processs
tf.data service workers. "AUTO" works well for most cases, while users
can specify other targets. For example, "LOCAL" helps avoid RPCs and
data copy if every TF worker colocates with a tf.data service worker.
Defaults to "AUTO" .
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates. Some content is licensed under the numpy license.
Last updated 2021-08-16 UTC.
[null,null,["Last updated 2021-08-16 UTC."],[],[],null,["# tf.data.experimental.service.from_dataset_id\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.6.0/tensorflow/python/data/experimental/ops/data_service_ops.py#L785-L893) |\n\nCreates a dataset which reads data from the tf.data service.\n\n#### View aliases\n\n\n**Compat aliases for migration**\n\nSee\n[Migration guide](https://www.tensorflow.org/guide/migrate) for\nmore details.\n\n[`tf.compat.v1.data.experimental.service.from_dataset_id`](https://www.tensorflow.org/api_docs/python/tf/data/experimental/service/from_dataset_id)\n\n\u003cbr /\u003e\n\n tf.data.experimental.service.from_dataset_id(\n processing_mode, service, dataset_id, element_spec=None, job_name=None,\n consumer_index=None, num_consumers=None, max_outstanding_requests=None,\n data_transfer_protocol=None, target_workers='AUTO'\n )\n\nThis is useful when the dataset is registered by one process, then used in\nanother process. When the same process is both registering and reading from\nthe dataset, it is simpler to use [`tf.data.experimental.service.distribute`](../../../../tf/data/experimental/service/distribute)\ninstead.\n\nBefore using `from_dataset_id`, the dataset must have been registered with the\ntf.data service using [`tf.data.experimental.service.register_dataset`](../../../../tf/data/experimental/service/register_dataset).\n`register_dataset` returns a dataset id for the registered dataset. That is\nthe `dataset_id` which should be passed to `from_dataset_id`.\n\nThe `element_spec` argument indicates the [`tf.TypeSpec`](../../../../tf/TypeSpec)s for the elements\nproduced by the dataset. Currently `element_spec` must be explicitly\nspecified, and match the dataset registered under `dataset_id`. `element_spec`\ndefaults to `None` so that in the future we can support automatically\ndiscovering the `element_spec` by querying the tf.data service.\n\n[`tf.data.experimental.service.distribute`](../../../../tf/data/experimental/service/distribute) is a convenience method which\ncombines `register_dataset` and `from_dataset_id` into a dataset\ntransformation.\nSee the documentation for [`tf.data.experimental.service.distribute`](../../../../tf/data/experimental/service/distribute) for more\ndetail about how `from_dataset_id` works. \n\n dispatcher = tf.data.experimental.service.DispatchServer()\n dispatcher_address = dispatcher.target.split(\"://\")[1]\n worker = tf.data.experimental.service.WorkerServer(\n tf.data.experimental.service.WorkerConfig(\n dispatcher_address=dispatcher_address))\n dataset = tf.data.Dataset.range(10)\n dataset_id = tf.data.experimental.service.register_dataset(\n dispatcher.target, dataset)\n dataset = tf.data.experimental.service.from_dataset_id(\n processing_mode=\"parallel_epochs\",\n service=dispatcher.target,\n dataset_id=dataset_id,\n element_spec=dataset.element_spec)\n print(list(dataset.as_numpy_iterator()))\n [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `processing_mode` | A string specifying the policy for how data should be processed by tf.data workers. Can be either \"parallel_epochs\" to have each tf.data worker process a copy of the dataset, or \"distributed_epoch\" to split a single iteration of the dataset across all the workers. |\n| `service` | A string or a tuple indicating how to connect to the tf.data service. If it's a string, it should be in the format `[\u003cprotocol\u003e://]\u003caddress\u003e`, where `\u003caddress\u003e` identifies the dispatcher address and `\u003cprotocol\u003e` can optionally be used to override the default protocol to use. If it's a tuple, it should be (protocol, address). |\n| `dataset_id` | The id of the dataset to read from. This id is returned by `register_dataset` when the dataset is registered with the tf.data service. |\n| `element_spec` | A nested structure of [`tf.TypeSpec`](../../../../tf/TypeSpec)s representing the type of elements produced by the dataset. Use [`tf.data.Dataset.element_spec`](../../../../tf/data/Dataset#element_spec) to see the element spec for a given dataset. |\n| `job_name` | (Optional.) The name of the job. This argument makes it possible for multiple datasets to share the same job. The default behavior is that the dataset creates anonymous, exclusively owned jobs. |\n| `consumer_index` | (Optional.) The index of the consumer in the range from `0` to `num_consumers`. Must be specified alongside `num_consumers`. When specified, consumers will read from the job in a strict round-robin order, instead of the default first-come-first-served order. |\n| `num_consumers` | (Optional.) The number of consumers which will consume from the job. Must be specified alongside `consumer_index`. When specified, consumers will read from the job in a strict round-robin order, instead of the default first-come-first-served order. When `num_consumers` is specified, the dataset must have infinite cardinality to prevent a producer from running out of data early and causing consumers to go out of sync. |\n| `max_outstanding_requests` | (Optional.) A limit on how many elements may be requested at the same time. You can use this option to control the amount of memory used, since `distribute` won't use more than `element_size` \\* `max_outstanding_requests` of memory. |\n| `data_transfer_protocol` | (Optional.) The protocol to use for transferring data with the tf.data service. By default, data is transferred using gRPC. |\n| `target_workers` | (Optional.) Which workers to read from. If `\"AUTO\"`, tf.data runtime decides which workers to read from. If `\"ANY\"`, reads from any tf.data service workers. If `\"LOCAL\"`, only reads from local in-processs tf.data service workers. `\"AUTO\"` works well for most cases, while users can specify other targets. For example, `\"LOCAL\"` helps avoid RPCs and data copy if every TF worker colocates with a tf.data service worker. Defaults to `\"AUTO\"`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A [`tf.data.Dataset`](../../../../tf/data/Dataset) which reads from the tf.data service. ||\n\n\u003cbr /\u003e"]]