tfds.download.DownloadConfig

Configuration for tfds.core.DatasetBuilder.download_and_prepare.

extract_dir str, directory where extracted files are stored. Defaults to "/extracted".
manual_dir str, read-only directory where manually downloaded/extracted data is stored. Defaults to <download_dir>/manual.
download_mode tfds.GenerateMode, how to deal with downloads or data that already exists. Defaults to REUSE_DATASET_IF_EXISTS, which will reuse both downloads and data if it already exists.
compute_stats tfds.download.ComputeStats, whether to compute statistics over the generated data. Defaults to AUTO.
max_examples_per_split int, optional max number of examples to write into each split (used for testing). If set to 0, only execute the _split_generators (download the original data), but skip _generator_examples.
register_checksums bool, defaults to False. If True, checksum of downloaded files are recorded.
force_checksums_validation bool, defaults to False. If True, raises an error if an URL do not have checksums.
beam_runner Runner to pass to beam.Pipeline, only used for datasets based on Beam for the generation.
beam_options PipelineOptions to pass to beam.Pipeline, only used for datasets based on Beam for the generation.
try_download_gcs bool, defaults to True. If True, prepared dataset will be downloaded from GCS, when available. If False, dataset will be downloaded and prepared from scratch.
verify_ssl bool, defaults to True. If True, will verify certificate when downloading dataset.
override_max_simultaneous_downloads int, optional max number of simultaneous downloads. If set, it will override dataset builder and downloader default values.
num_shards optional number of shards that should be created. If None, then the number of shards is computed based on the total size of the dataset and the min and max shard size.
min_shard_size optional minimum shard size in bytes. If None, 64 MB is used.
max_shard_size optional maximum shard size in bytes. If None, 1 GiB is used.

Methods

get_shard_config

View source

replace

View source

Returns a copy with updated attributes.

beam_options None
beam_runner None
compute_stats <ComputeStatsMode.SKIP: 'skip'>
download_mode <GenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'>
extract_dir None
force_checksums_validation False
manual_dir None
max_examples_per_split None
max_shard_size 1073741824
min_shard_size 67108864
num_shards None
override_max_simultaneous_downloads None
register_checksums False
try_download_gcs True
verify_ssl True