TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.beam.ReadFromTFDS

Creates a beam pipeline yielding TFDS examples.

tfds.beam.ReadFromTFDS(
    pipeline,
    builder: tfds.core.DatasetBuilder,
    split: str,
    workers_per_shard: int = 1,
    **as_dataset_kwargs
)

Used in the notebooks

Used in the tutorials
Licensed under the Apache License, Version 2.0 (the "License"); Recommending Movies: Recommender Models in TFX

Each dataset shard will be processed in parallel.

Usage:

builder = tfds.builder('my_dataset')

_ = (
    pipeline
    | tfds.beam.ReadFromTFDS(builder, split='train')
    | beam.Map(tfds.as_numpy)
    | ...
)

Use tfds.as_numpy to convert each examples from tf.Tensor to numpy.

The split argument can make use of subsplits, eg 'train[:100]', only when the batch_size=None (in as_dataset_kwargs). Note: the order of the images will be different than when tfds.load(split='train[:100]') is used, but the same examples will be used.

Args
`pipeline`	beam pipeline (automatically set)
`builder`	Dataset builder to load
`split`	Split name to load (e.g. `train+test`, `train`)
`workers_per_shard`	number of workers that should read a shard in parallel. The shard will be split in this many parts. Note that workers cannot skip to a specific row in a tfrecord file, so they need to read the file up until that point without using that data.
`**as_dataset_kwargs`	Arguments forwarded to `builder.as_dataset`.

Returns
The PCollection containing the TFDS examples.

tfds.beam.ReadFromTFDS

Used in the notebooks

Usage:

Args

Returns