tfds.core.DatasetInfo

Information about a dataset.

DatasetInfo documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.

builder DatasetBuilder or DatasetIdentity. The dataset builder or identity will be used to populate this info.
description str, description of this dataset.
features tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method.
supervised_keys Specifies the input structure for supervised learning, if applicable for the dataset, used with "as_supervised". The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() with as_supervised=True, the tf.data.Dataset object will yield the structure defined by the keys passed here, instead of that defined by the features argument. Typically this is a (input_key, target_key) tuple, and the dataset yields a tuple of tensors (input, target) tensors.

To yield a more complex structure, pass a tuple of tf.nest compatible structures of feature keys. The resulting Dataset will yield structures with each key replaced by the coresponding tensor. For example, passing a triple of keys would return a dataset that yields (feature, target, sample_weights) triples for keras. Using supervised_keys=({'a':'a','b':'b'}, 'c') would create a dataset yielding a tuple with a dictionary of features in the features position.

Note that selecting features in nested tfds.features.FeaturesDict objects is not supported.

disable_shuffling bool, specify whether to shuffle the examples.
homepage str, optional, the homepage for this dataset.
citation str, optional, the citation to use for this dataset.
metadata tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset.
license license of the dataset.
redistribution_info information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the license subfield will automatically be written to a LICENSE file stored with the dataset.
split_dict information about the splits in this dataset.

as_json

as_proto

as_proto_with_features

citation

config_description

config_name

config_tags

data_dir

dataset_size Generated dataset files size, in bytes.
description

disable_shuffling

download_size Downloaded files size, in bytes.
features

file_format

full_name Full canonical name: (//).
homepage

initialized Whether DatasetInfo has been fully initialized.
metadata

module_name

name

redistribution_info

release_notes

splits

supervised_keys

version

Methods

add_file_data_source_access

View source

Records that the given query was used to generate this dataset.

Arguments
path path or paths of files that were read. Can be a file pattern. Multiple paths or patterns can be specified as a comma-separated string or a list.
url URL referring to the data being used.

add_sql_data_source_access

View source

Records that the given query was used to generate this dataset.

add_tfds_data_source_access

View source

Records that the given query was used to generate this dataset.

Args
dataset_reference

url a URL referring to the TFDS dataset.

add_url_access

View source

Records the URL used to generate this dataset.

from_proto

View source

Instantiates DatasetInfo from the given builder and proto.

initialize_from_bucket

View source

Initialize DatasetInfo from GCS bucket info files.

read_from_directory

View source

Update DatasetInfo from the metadata files in dataset_info_dir.

This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.

This will overwrite all previous metadata.

Args
dataset_info_dir The directory containing the metadata file. This should be the root directory of a specific dataset version.

Raises
FileNotFoundError If the dataset_info.json can't be found.

set_file_format

View source

Internal function to define the file format.

The file format is set during FileReaderBuilder.__init__, not DatasetInfo.init.

Args
file_format The file format.
override Whether the file format should be overridden if it is already set.

Raises
ValueError if the file format was already set and the override parameter was False.
RuntimeError if an incorrect combination of options is given, e.g. override=True when the DatasetInfo is already fully initialized.

set_splits

View source

Split setter (private method).

update_data_dir

View source

Updates the data dir for each split.

write_to_directory

View source

Write DatasetInfo as JSON to dataset_info_dir + labels & features.

Args
dataset_info_dir path to directory in which to save the dataset_info.json file, as well as features.json and *.labels.txt if applicable.
all_metadata defaults to True. If False, will not write metadata which may have an impact on how the data is read (features.json). Should be set to True whenever write_to_directory is called for the first time for a new dataset.