View source on GitHub |
Information about a dataset.
tfds.core.DatasetInfo(
*,
builder: Union[DatasetIdentity, Any],
description: Optional[str] = None,
features: Optional[feature_lib.FeatureConnector] = None,
supervised_keys: Optional[SupervisedKeysType] = None,
disable_shuffling: bool = False,
homepage: Optional[str] = None,
citation: Optional[str] = None,
metadata: Optional[Metadata] = None,
license: Optional[str] = None,
redistribution_info: Optional[Dict[str, str]] = None,
split_dict: Optional[splits_lib.SplitDict] = None
)
DatasetInfo
documents datasets, including its name, version, and features.
See the constructor arguments and properties for a full list.
Args | |
---|---|
builder
|
DatasetBuilder or DatasetIdentity . The dataset builder or
identity will be used to populate this info.
|
description
|
str , description of this dataset.
|
features
|
tfds.features.FeaturesDict , Information on the feature dict of
the tf.data.Dataset() object from the builder.as_dataset() method.
|
supervised_keys
|
Specifies the input structure for supervised learning, if
applicable for the dataset, used with "as_supervised". The keys
correspond to the feature names to select in info.features . When
calling tfds.core.DatasetBuilder.as_dataset() with
as_supervised=True , the tf.data.Dataset object will yield the
structure defined by the keys passed here, instead of that defined by
the features argument. Typically this is a (input_key, target_key)
tuple, and the dataset yields a tuple of tensors (input, target)
tensors.
To yield a more complex structure, pass a tuple of Note that selecting features in nested |
disable_shuffling
|
bool , specify whether to shuffle the examples.
|
homepage
|
str , optional, the homepage for this dataset.
|
citation
|
str , optional, the citation to use for this dataset.
|
metadata
|
tfds.core.Metadata , additonal object which will be
stored/restored with the dataset. This allows for storing additional
information with the dataset.
|
license
|
license of the dataset. |
redistribution_info
|
information needed for redistribution, as specified
in dataset_info_pb2.RedistributionInfo . The content of the license
subfield will automatically be written to a LICENSE file stored with the
dataset.
|
split_dict
|
information about the splits in this dataset. |
Methods
add_file_data_source_access
add_file_data_source_access(
path: Union[epath.PathLike, Iterable[epath.PathLike]],
url: Optional[str] = None
) -> None
Records that the given query was used to generate this dataset.
Arguments | |
---|---|
path
|
path or paths of files that were read. Can be a file pattern. Multiple paths or patterns can be specified as a comma-separated string or a list. |
url
|
URL referring to the data being used. |
add_sql_data_source_access
add_sql_data_source_access(
sql_query: str
) -> None
Records that the given query was used to generate this dataset.
add_tfds_data_source_access
add_tfds_data_source_access(
dataset_reference: naming.DatasetReference, url: Optional[str] = None
) -> None
Records that the given query was used to generate this dataset.
Args | |
---|---|
dataset_reference
|
|
url
|
a URL referring to the TFDS dataset. |
add_url_access
add_url_access(
url: str, checksum: Optional[str] = None
) -> None
Records the URL used to generate this dataset.
from_proto
@classmethod
from_proto( builder, proto: dataset_info_pb2.DatasetInfo ) -> 'DatasetInfo'
Instantiates DatasetInfo from the given builder and proto.
initialize_from_bucket
initialize_from_bucket() -> None
Initialize DatasetInfo from GCS bucket info files.
read_from_directory
read_from_directory(
dataset_info_dir: epath.PathLike
) -> None
Update DatasetInfo from the metadata files in dataset_info_dir
.
This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.
This will overwrite all previous metadata.
Args | |
---|---|
dataset_info_dir
|
The directory containing the metadata file. This should be the root directory of a specific dataset version. |
Raises | |
---|---|
FileNotFoundError
|
If the dataset_info.json can't be found. |
set_file_format
set_file_format(
file_format: Union[None, str, file_adapters.FileFormat],
override: bool = False
) -> None
Internal function to define the file format.
The file format is set during FileReaderBuilder.__init__
,
not DatasetInfo.init
.
Args | |
---|---|
file_format
|
The file format. |
override
|
Whether the file format should be overridden if it is already set. |
Raises | |
---|---|
ValueError
|
if the file format was already set and the override
parameter was False.
|
RuntimeError
|
if an incorrect combination of options is given, e.g.
override=True when the DatasetInfo is already fully initialized.
|
set_splits
set_splits(
split_dict: splits_lib.SplitDict
) -> None
Split setter (private method).
update_data_dir
update_data_dir(
data_dir: str
) -> None
Updates the data dir for each split.
write_to_directory
write_to_directory(
dataset_info_dir: epath.PathLike, all_metadata=True
) -> None
Write DatasetInfo
as JSON to dataset_info_dir
+ labels & features.
Args | |
---|---|
dataset_info_dir
|
path to directory in which to save the
dataset_info.json file, as well as features.json and *.labels.txt
if applicable.
|
all_metadata
|
defaults to True. If False, will not write metadata which
may have an impact on how the data is read (features.json). Should be
set to True whenever write_to_directory is called for the first time
for a new dataset.
|