tfds.core.DatasetInfo

Information about a dataset.

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

builder DatasetBuilder, dataset builder for this info.
description str, description of this dataset.
features tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
supervised_keys tuple of (input_key, target_key), Specifies the input feature and the label for supervised learning, if applicable for the dataset. The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() with as_supervised=True, the tf.data.Dataset object will yield the (input, target) defined here.
homepage str, optional, the homepage for this dataset.
citation str, optional, the citation to use for this dataset.
metadata tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset.
redistribution_info dict, optional, information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the license subfield will automatically be written to a LICENSE file stored with the dataset.

as_json

as_proto

citation

data_dir

dataset_size Generated dataset files size, in bytes.
description

download_size Downloaded files size, in bytes.
features

full_name Full canonical name: (//).
homepage

initialized Whether DatasetInfo has been fully initialized.
metadata

name

redistribution_info

splits

supervised_keys

version

Methods

compute_dynamic_properties

View source

initialize_from_bucket

View source

Initialize DatasetInfo from GCS bucket info files.

read_from_directory

View source

Update DatasetInfo from the JSON file in dataset_info_dir.

This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.

This will overwrite all previous metadata.

Args
dataset_info_dir str The directory containing the metadata file. This should be the root directory of a specific dataset version.

update_splits_if_different

View source

Overwrite the splits if they are different from the current ones.

  • If splits aren't already defined or different (ex: different number of shards), then the new split dict is used. This will trigger stats computation during download_and_prepare.
  • If splits are already defined in DatasetInfo and similar (same names and shards): keep the restored split which contains the statistics (restored from GCS or file)

Args
split_dict tfds.core.SplitDict, the new split

write_to_directory

View source

Write DatasetInfo as JSON to dataset_info_dir.