tfds.beam.ReadFromTFDS
Stay organized with collections
Save and categorize content based on your preferences.
Creates a beam pipeline yielding TFDS examples.
tfds.beam.ReadFromTFDS(
pipeline,
builder: tfds.core.DatasetBuilder
,
split: str,
workers_per_shard: int = 1,
**as_dataset_kwargs
)
Used in the notebooks
Each dataset shard will be processed in parallel.
Usage:
builder = tfds.builder('my_dataset')
_ = (
pipeline
| tfds.beam.ReadFromTFDS(builder, split='train')
| beam.Map(tfds.as_numpy)
| ...
)
Use tfds.as_numpy
to convert each examples from tf.Tensor
to numpy.
The split argument can make use of subsplits, eg 'train[:100]', only when the
batch_size=None (in as_dataset_kwargs). Note: the order of the images will be
different than when tfds.load(split='train[:100]') is used, but the same
examples will be used.
Args |
pipeline
|
beam pipeline (automatically set)
|
builder
|
Dataset builder to load
|
split
|
Split name to load (e.g. train+test , train )
|
workers_per_shard
|
number of workers that should read a shard in parallel.
The shard will be split in this many parts. Note that workers cannot skip
to a specific row in a tfrecord file, so they need to read the file up
until that point without using that data.
|
**as_dataset_kwargs
|
Arguments forwarded to builder.as_dataset .
|
Returns |
The PCollection containing the TFDS examples.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-19 UTC.
[null,null,["Last updated 2024-06-19 UTC."],[],[],null,["# tfds.beam.ReadFromTFDS\n\n\u003cbr /\u003e\n\n|-----------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/beam_utils.py#L32-L136) |\n\nCreates a beam pipeline yielding TFDS examples. \n\n tfds.beam.ReadFromTFDS(\n pipeline,\n builder: ../../tfds/core/DatasetBuilder,\n split: str,\n workers_per_shard: int = 1,\n **as_dataset_kwargs\n )\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - [Licensed under the Apache License, Version 2.0 (the \"License\");](https://www.tensorflow.org/tfx/tutorials/tfx/gpt2_finetuning_and_conversion) - [Recommending Movies: Recommender Models in TFX](https://www.tensorflow.org/tfx/tutorials/tfx/recommenders) |\n\nEach dataset shard will be processed in parallel.\n\n#### Usage:\n\n builder = tfds.builder('my_dataset')\n\n _ = (\n pipeline\n | tfds.beam.ReadFromTFDS(builder, split='train')\n | beam.Map(tfds.as_numpy)\n | ...\n )\n\nUse [`tfds.as_numpy`](../../tfds/as_numpy) to convert each examples from [`tf.Tensor`](https://www.tensorflow.org/api_docs/python/tf/Tensor) to numpy.\n\nThe split argument can make use of subsplits, eg 'train\\[:100\\]', only when the\nbatch_size=None (in as_dataset_kwargs). Note: the order of the images will be\ndifferent than when tfds.load(split='train\\[:100\\]') is used, but the same\nexamples will be used.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `pipeline` | beam pipeline (automatically set) |\n| `builder` | Dataset builder to load |\n| `split` | Split name to load (e.g. `train+test`, `train`) |\n| `workers_per_shard` | number of workers that should read a shard in parallel. The shard will be split in this many parts. Note that workers cannot skip to a specific row in a tfrecord file, so they need to read the file up until that point without using that data. |\n| `**as_dataset_kwargs` | Arguments forwarded to `builder.as_dataset`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| The PCollection containing the TFDS examples. ||\n\n\u003cbr /\u003e"]]