TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.data_source

Gets a data source from the named dataset.

tfds.data_source(
    name: str,
    *,
    split: Optional[Tree[splits_lib.SplitArg]] = None,
    data_dir: Union[None, str, os.PathLike] = None,
    download: bool = True,
    decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
    builder_kwargs: Optional[Dict[str, Any]] = None,
    download_and_prepare_kwargs: Optional[Dict[str, Any]] = None,
    try_gcs: bool = False
) -> type_utils.ListOrTreeOrElem[Sequence[Any]]

Used in the notebooks

Used in the tutorials
TFDS for Jax and PyTorch

tfds.data_source is a convenience method that:

Fetches the tfds.core.DatasetBuilder by name:

builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)

Generates the data (when download=True):

builder.download_and_prepare(**download_and_prepare_kwargs)

Gets the data source:

ds = builder.as_data_source(split=split)

You can consume data sources:

In Python by iterating over them:

for example in ds['train']:
  print(example)

With a DataLoader (e.g., with Pytorch).

Args
`name`	`str`, the registered name of the `DatasetBuilder` (the snake case version of the class name). The config and version can also be specified in the name as follows: `'dataset_name[/config_name][:version]'`. For example, `'movielens/25m-ratings'` (for the latest version of `'25m-ratings'`), `'movielens:0.1.0'` (for the default config and version 0.1.0), or`'movielens/25m-ratings:0.1.0'`. Note that only the latest version can be generated, but old versions can be read if they are present on disk. For convenience, the `name` parameter can contain comma-separated keyword arguments for the builder. For example, `'foo_bar/a=True,b=3'` would use the `FooBar` dataset passing the keyword arguments `a=True` and `b=3` (for builders with configs, it would be `'foo_bar/zoo/a=True,b=3'` to use the `'zoo'` config and pass to the builder keyword arguments `a=True` and `b=3`).
`split`	Which split of the data to load (e.g. `'train'`, `'test'`, `['train', 'test']`, `'train[80%:]'`,...). See our split API guide. If `None`, will return all splits in a `Dict[Split, Sequence]`
`data_dir`	directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to datasets are stored.
`download`	`bool` (optional), whether to call `tfds.core.DatasetBuilder.download_and_prepare` before calling `tfds.core.DatasetBuilder.as_data_source`. If `False`, data is expected to be in `data_dir`. If `True` and the data is already in `data_dir`, when data_dir is a Placer path.
`decoders`	Nested dict of `Decoder` objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guide for more info.
`builder_kwargs`	`dict` (optional), keyword arguments to be passed to the `tfds.core.DatasetBuilder` constructor. `data_dir` will be passed through by default.
`download_and_prepare_kwargs`	`dict` (optional) keyword arguments passed to `tfds.core.DatasetBuilder.download_and_prepare` if `download=True`. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir.
`try_gcs`	`bool`, if True, `tfds.load` will see if the dataset exists on the public GCS bucket before building it locally. This is equivalent to passing `data_dir='gs://tfds-data/datasets'`. Warning: `try_gcs` is different than `builder_kwargs.download_config.try_download_gcs`. `try_gcs` (default: False) overrides `data_dir` to be the public GCS bucket. `try_download_gcs` (default: True) allows downloading from GCS while keeping a different `data_dir` than the public GCS bucket. So, to fully bypass GCS, please use `try_gcs=False` and `download_and_prepare_kwargs={'download_config': tfds.core.download.DownloadConfig(try_download_gcs=False)})`.

Returns
`Sequence` if `split`, `dict<key: tfds.Split, value: Sequence>` otherwise.

tfds.data_source Stay organized with collections Save and categorize content based on your preferences.

Used in the notebooks

Args

Returns

tfds.data_source