TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tfds.download.DownloadConfig

Configuration for tfds.core.DatasetBuilder.download_and_prepare.

tfds.download.DownloadConfig(
    extract_dir: Optional[epath.PathLike] = None,
    manual_dir: Optional[epath.PathLike] = None,
    download_mode: util.GenerateMode = tfds.download.DownloadConfig.download_mode,
    compute_stats: util.ComputeStatsMode = tfds.download.ComputeStatsMode.SKIP,
    max_examples_per_split: Optional[int] = None,
    register_checksums: bool = False,
    force_checksums_validation: bool = False,
    beam_runner: Optional[Any] = None,
    beam_options: Optional[Any] = None,
    try_download_gcs: bool = True,
    verify_ssl: bool = True,
    override_max_simultaneous_downloads: Optional[int] = None,
    num_shards: Optional[int] = None,
    min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,
    max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE
)

Attributes
`extract_dir`	`str`, directory where extracted files are stored. Defaults to "/extracted".
`manual_dir`	`str`, read-only directory where manually downloaded/extracted data is stored. Defaults to `<download_dir>/manual`.
`download_mode`	`tfds.GenerateMode`, how to deal with downloads or data that already exists. Defaults to `REUSE_DATASET_IF_EXISTS`, which will reuse both downloads and data if it already exists.
`compute_stats`	`tfds.download.ComputeStats`, whether to compute statistics over the generated data. Defaults to `AUTO`.
`max_examples_per_split`	`int`, optional max number of examples to write into each split (used for testing). If set to 0, only execute the `_split_generators` (download the original data), but skip `_generator_examples`.
`register_checksums`	`bool`, defaults to False. If True, checksum of downloaded files are recorded.
`force_checksums_validation`	`bool`, defaults to False. If True, raises an error if an URL do not have checksums.
`beam_runner`	Runner to pass to `beam.Pipeline`, only used for datasets based on Beam for the generation.
`beam_options`	`PipelineOptions` to pass to `beam.Pipeline`, only used for datasets based on Beam for the generation.
`try_download_gcs`	`bool`, defaults to True. If True, prepared dataset will be downloaded from GCS, when available. If False, dataset will be downloaded and prepared from scratch.
`verify_ssl`	`bool`, defaults to True. If True, will verify certificate when downloading dataset.
`override_max_simultaneous_downloads`	`int`, optional max number of simultaneous downloads. If set, it will override dataset builder and downloader default values.
`num_shards`	optional number of shards that should be created. If `None`, then the number of shards is computed based on the total size of the dataset and the min and max shard size.
`min_shard_size`	optional minimum shard size in bytes. If `None`, 64 MB is used.
`max_shard_size`	optional maximum shard size in bytes. If `None`, 1 GiB is used.

Methods

`get_shard_config`

View source

get_shard_config() -> shard_utils.ShardConfig

`replace`

View source

replace(
    **kwargs
) -> DownloadConfig

Returns a copy with updated attributes.

Class Variables
beam_options	`None`
beam_runner	`None`
compute_stats	`<ComputeStatsMode.SKIP: 'skip'>`
download_mode	`<GenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'>`
extract_dir	`None`
force_checksums_validation	`False`
manual_dir	`None`
max_examples_per_split	`None`
max_shard_size	`1073741824`
min_shard_size	`67108864`
num_shards	`None`
override_max_simultaneous_downloads	`None`
register_checksums	`False`
try_download_gcs	`True`
verify_ssl	`True`

tfds.download.DownloadConfig Stay organized with collections Save and categorize content based on your preferences.

Attributes

Methods

get_shard_config

replace

Class Variables

tfds.download.DownloadConfig

`get_shard_config`

`replace`