tfds.download.DownloadConfig
Stay organized with collections
Save and categorize content based on your preferences.
Configuration for tfds.core.DatasetBuilder.download_and_prepare
.
tfds.download.DownloadConfig(
extract_dir: Optional[epath.PathLike] = None,
manual_dir: Optional[epath.PathLike] = None,
download_mode: util.GenerateMode = tfds.download.DownloadConfig.download_mode
,
compute_stats: util.ComputeStatsMode = tfds.download.ComputeStatsMode.SKIP
,
max_examples_per_split: Optional[int] = None,
register_checksums: bool = False,
force_checksums_validation: bool = False,
beam_runner: Optional[Any] = None,
beam_options: Optional[Any] = None,
try_download_gcs: bool = True,
verify_ssl: bool = True,
override_max_simultaneous_downloads: Optional[int] = None,
num_shards: Optional[int] = None,
min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,
max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE
)
Attributes |
extract_dir
|
str , directory where extracted files are stored. Defaults to
"/extracted".
|
manual_dir
|
str , read-only directory where manually downloaded/extracted
data is stored. Defaults to <download_dir>/manual .
|
download_mode
|
tfds.GenerateMode , how to deal with downloads or data that
already exists. Defaults to REUSE_DATASET_IF_EXISTS , which will reuse
both downloads and data if it already exists.
|
compute_stats
|
tfds.download.ComputeStats , whether to compute statistics
over the generated data. Defaults to AUTO .
|
max_examples_per_split
|
int , optional max number of examples to write into
each split (used for testing). If set to 0, only execute the
_split_generators (download the original data), but skip
_generator_examples .
|
register_checksums
|
bool , defaults to False. If True, checksum of
downloaded files are recorded.
|
force_checksums_validation
|
bool , defaults to False. If True, raises an
error if an URL do not have checksums.
|
beam_runner
|
Runner to pass to beam.Pipeline , only used for datasets based
on Beam for the generation.
|
beam_options
|
PipelineOptions to pass to beam.Pipeline , only used for
datasets based on Beam for the generation.
|
try_download_gcs
|
bool , defaults to True. If True, prepared dataset will
be downloaded from GCS, when available. If False, dataset will be
downloaded and prepared from scratch.
|
verify_ssl
|
bool , defaults to True. If True, will verify certificate when
downloading dataset.
|
override_max_simultaneous_downloads
|
int , optional max number of
simultaneous downloads. If set, it will override dataset builder and
downloader default values.
|
num_shards
|
optional number of shards that should be created. If None ,
then the number of shards is computed based on the total size of the
dataset and the min and max shard size.
|
min_shard_size
|
optional minimum shard size in bytes. If None , 64 MB is
used.
|
max_shard_size
|
optional maximum shard size in bytes. If None , 1 GiB is
used.
|
Methods
get_shard_config
View source
get_shard_config() -> shard_utils.ShardConfig
replace
View source
replace(
**kwargs
) -> DownloadConfig
Returns a copy with updated attributes.
Class Variables |
beam_options
|
None
|
beam_runner
|
None
|
compute_stats
|
<ComputeStatsMode.SKIP: 'skip'>
|
download_mode
|
<GenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'>
|
extract_dir
|
None
|
force_checksums_validation
|
False
|
manual_dir
|
None
|
max_examples_per_split
|
None
|
max_shard_size
|
1073741824
|
min_shard_size
|
67108864
|
num_shards
|
None
|
override_max_simultaneous_downloads
|
None
|
register_checksums
|
False
|
try_download_gcs
|
True
|
verify_ssl
|
True
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[null,null,["Last updated 2024-04-26 UTC."],[],[],null,["# tfds.download.DownloadConfig\n\n\u003cbr /\u003e\n\n|--------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/download/download_manager.py#L62-L130) |\n\nConfiguration for [`tfds.core.DatasetBuilder.download_and_prepare`](../../tfds/core/DatasetBuilder#download_and_prepare). \n\n tfds.download.DownloadConfig(\n extract_dir: Optional[epath.PathLike] = None,\n manual_dir: Optional[epath.PathLike] = None,\n download_mode: util.GenerateMode = ../../tfds/download/DownloadConfig#download_mode,\n compute_stats: util.ComputeStatsMode = ../../tfds/download/ComputeStatsMode#SKIP,\n max_examples_per_split: Optional[int] = None,\n register_checksums: bool = False,\n force_checksums_validation: bool = False,\n beam_runner: Optional[Any] = None,\n beam_options: Optional[Any] = None,\n try_download_gcs: bool = True,\n verify_ssl: bool = True,\n override_max_simultaneous_downloads: Optional[int] = None,\n num_shards: Optional[int] = None,\n min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,\n max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE\n )\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Attributes ---------- ||\n|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `extract_dir` | `str`, directory where extracted files are stored. Defaults to \"/extracted\". |\n| `manual_dir` | `str`, read-only directory where manually downloaded/extracted data is stored. Defaults to `\u003cdownload_dir\u003e/manual`. |\n| `download_mode` | [`tfds.GenerateMode`](../../tfds/download/GenerateMode), how to deal with downloads or data that already exists. Defaults to `REUSE_DATASET_IF_EXISTS`, which will reuse both downloads and data if it already exists. |\n| `compute_stats` | `tfds.download.ComputeStats`, whether to compute statistics over the generated data. Defaults to `AUTO`. |\n| `max_examples_per_split` | `int`, optional max number of examples to write into each split (used for testing). If set to 0, only execute the `_split_generators` (download the original data), but skip `_generator_examples`. |\n| `register_checksums` | `bool`, defaults to False. If True, checksum of downloaded files are recorded. |\n| `force_checksums_validation` | `bool`, defaults to False. If True, raises an error if an URL do not have checksums. |\n| `beam_runner` | Runner to pass to `beam.Pipeline`, only used for datasets based on Beam for the generation. |\n| `beam_options` | `PipelineOptions` to pass to `beam.Pipeline`, only used for datasets based on Beam for the generation. |\n| `try_download_gcs` | `bool`, defaults to True. If True, prepared dataset will be downloaded from GCS, when available. If False, dataset will be downloaded and prepared from scratch. |\n| `verify_ssl` | `bool`, defaults to True. If True, will verify certificate when downloading dataset. |\n| `override_max_simultaneous_downloads` | `int`, optional max number of simultaneous downloads. If set, it will override dataset builder and downloader default values. |\n| `num_shards` | optional number of shards that should be created. If `None`, then the number of shards is computed based on the total size of the dataset and the min and max shard size. |\n| `min_shard_size` | optional minimum shard size in bytes. If `None`, 64 MB is used. |\n| `max_shard_size` | optional maximum shard size in bytes. If `None`, 1 GiB is used. |\n\n\u003cbr /\u003e\n\nMethods\n-------\n\n### `get_shard_config`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/download/download_manager.py#L121-L126) \n\n get_shard_config() -\u003e shard_utils.ShardConfig\n\n### `replace`\n\n[View source](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/core/download/download_manager.py#L128-L130) \n\n replace(\n **kwargs\n ) -\u003e DownloadConfig\n\nReturns a copy with updated attributes.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Class Variables --------------- ||\n|-------------------------------------|---------------------------------------------------------------------|\n| beam_options | `None` |\n| beam_runner | `None` |\n| compute_stats | `\u003cComputeStatsMode.SKIP: 'skip'\u003e` |\n| download_mode | `\u003cGenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'\u003e` |\n| extract_dir | `None` |\n| force_checksums_validation | `False` |\n| manual_dir | `None` |\n| max_examples_per_split | `None` |\n| max_shard_size | `1073741824` |\n| min_shard_size | `67108864` |\n| num_shards | `None` |\n| override_max_simultaneous_downloads | `None` |\n| register_checksums | `False` |\n| try_download_gcs | `True` |\n| verify_ssl | `True` |\n\n\u003cbr /\u003e"]]