|  View source on GitHub | 
Loads the named dataset into a tf.data.Dataset.
tfds.load(
    name: str,
    *,
    split: Optional[Tree[splits_lib.SplitArg]] = None,
    data_dir: Union[None, str, os.PathLike] = None,
    batch_size: Optional[int] = None,
    shuffle_files: bool = False,
    download: bool = True,
    as_supervised: bool = False,
    decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
    read_config: Optional[read_config_lib.ReadConfig] = None,
    with_info: bool = False,
    builder_kwargs: Optional[Dict[str, Any]] = None,
    download_and_prepare_kwargs: Optional[Dict[str, Any]] = None,
    as_dataset_kwargs: Optional[Dict[str, Any]] = None,
    try_gcs: bool = False
)
Used in the notebooks
| Used in the guide | Used in the tutorials | 
|---|---|
tfds.load is a convenience method that:
- Fetch the - tfds.core.DatasetBuilderby name:- builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
- Generate the data (when - download=True):- builder.download_and_prepare(**download_and_prepare_kwargs)
- Load the - tf.data.Datasetobject:- ds = builder.as_dataset( split=split, as_supervised=as_supervised, shuffle_files=shuffle_files, read_config=read_config, decoders=decoders, **as_dataset_kwargs, )
See: https://www.tensorflow.org/datasets/overview#load_a_dataset for more examples.
If you'd like NumPy arrays instead of tf.data.Datasets or tf.Tensors,
you can pass the return value to tfds.as_numpy.
| Args | |
|---|---|
| name | str, the registered name of theDatasetBuilder(the snake case
version of the class name). The config and version can also be specified
in the name as follows:'dataset_name[/config_name][:version]'. For
example,'movielens/25m-ratings'(for the latest version of'25m-ratings'),'movielens:0.1.0'(for the default config and version
0.1.0), or'movielens/25m-ratings:0.1.0'. Note that only the latest
version can be generated, but old versions can be read if they are present
on disk. For convenience, thenameparameter can contain comma-separated
keyword arguments for the builder. For example,'foo_bar/a=True,b=3'would use theFooBardataset passing the keyword argumentsa=Trueandb=3(for builders with configs, it would be'foo_bar/zoo/a=True,b=3'to use the'zoo'config and pass to the builder keyword argumentsa=Trueandb=3). | 
| split | Which split of the data to load (e.g. 'train','test',['train',
'test'],'train[80%:]',...). See our split API
guide. IfNone, will return
all splits in aDict[Split, tf.data.Dataset] | 
| data_dir | directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to datasets are stored. | 
| batch_size | int, if set, add a batch dimension to examples. Note that
variable length features will be 0-padded. Ifbatch_size=-1, will return
the full dataset astf.Tensors. | 
| shuffle_files | bool, whether to shuffle the input files. Defaults toFalse. | 
| download | bool(optional), whether to calltfds.core.DatasetBuilder.download_and_preparebefore callingtfds.core.DatasetBuilder.as_dataset. IfFalse, data is expected to be
indata_dir. IfTrueand the data is already indata_dir,
when data_dir is a Placer path. | 
| as_supervised | bool, ifTrue, the returnedtf.data.Datasetwill have a
2-tuple structure(input, label)according tobuilder.info.supervised_keys. IfFalse, the default, the returnedtf.data.Datasetwill have a dictionary with all the features. | 
| decoders | Nested dict of Decoderobjects which allow to customize the
decoding. The structure should match the feature structure, but only
customized feature keys need to be present. See the
guide
for more info. | 
| read_config | tfds.ReadConfig, Additional options to configure the input
pipeline (e.g. seed, num parallel reads,...). | 
| with_info | bool, ifTrue,tfds.loadwill return the tuple
(tf.data.Dataset,tfds.core.DatasetInfo), the latter containing the
info associated with the builder. | 
| builder_kwargs | dict(optional), keyword arguments to be passed to thetfds.core.DatasetBuilderconstructor.data_dirwill be passed through
by default. | 
| download_and_prepare_kwargs | dict(optional) keyword arguments passed totfds.core.DatasetBuilder.download_and_prepareifdownload=True. Allow
to control where to download and extract the cached data. If not set,
cache_dir and manual_dir will automatically be deduced from data_dir. | 
| as_dataset_kwargs | dict(optional), keyword arguments passed totfds.core.DatasetBuilder.as_dataset. | 
| try_gcs | bool, if True,tfds.loadwill see if the dataset exists on the
public GCS bucket before building it locally. This is equivalent to
passingdata_dir='gs://tfds-data/datasets'. Warning:try_gcsis
different thanbuilder_kwargs.download_config.try_download_gcs.try_gcs(default: False) overridesdata_dirto be the public GCS
bucket.try_download_gcs(default: True) allows downloading from GCS
while keeping a differentdata_dirthan the public GCS bucket.  So, to
fully bypass GCS, please usetry_gcs=Falseanddownload_and_prepare_kwargs={'download_config':
tfds.core.download.DownloadConfig(try_download_gcs=False)}). | 
| Returns | |
|---|---|
| ds | tf.data.Dataset, the dataset requested, or ifsplitis None, adict<key: tfds.Split, value: tf.data.Dataset>. Ifbatch_size=-1,
these will be full datasets astf.Tensors. | 
| ds_info | tfds.core.DatasetInfo, ifwith_infois True, thentfds.loadwill return a tuple(ds, ds_info)containing dataset information
(version, features, splits, num_examples,...). Note that theds_infoobject documents the entire dataset, regardless of thesplitrequested.
Split-specific information is available inds_info.splits. |