tfds.beam.ReadFromTFDS

Creates a beam pipeline yielding TFDS examples.

Used in the notebooks

Used in the tutorials

Each dataset shard will be processed in parallel.

Usage:

builder = tfds.builder('my_dataset')

_ = (
    pipeline
    | tfds.beam.ReadFromTFDS(builder, split='train')
    | beam.Map(tfds.as_numpy)
    | ...
)

Use tfds.as_numpy to convert each examples from tf.Tensor to numpy.

The split argument can make use of subsplits, eg 'train[:100]', only when the batch_size=None (in as_dataset_kwargs). Note: the order of the images will be different than when tfds.load(split='train[:100]') is used, but the same examples will be used.

pipeline beam pipeline (automatically set)
builder Dataset builder to load
split Split name to load (e.g. train+test, train)
workers_per_shard number of workers that should read a shard in parallel. The shard will be split in this many parts. Note that workers cannot skip to a specific row in a tfrecord file, so they need to read the file up until that point without using that data.
**as_dataset_kwargs Arguments forwarded to builder.as_dataset.

The PCollection containing the TFDS examples.