View source on GitHub |
Loads a federated version of the iNaturalist 2017 dataset.
tff.simulation.datasets.inaturalist.load_data(
image_dir: str = 'images',
cache_dir: str = 'cache',
split: tff.simulation.datasets.inaturalist.INaturalistSplit
= tff.simulation.datasets.inaturalist.INaturalistSplit.USER_120K
) -> tuple[ClientData, tf.data.Dataset]
If the dataset is loaded for the first time, the images for the entire iNaturalist 2017 dataset will be downloaded from AWS Open Data Program.
The dataset is created from the images stored inside the image_dir. Once the dataset is created, it will be cached inside the cache directory.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values:
'image/decoded'
: Atf.Tensor
withdtype=tf.uint8
that corresponds to the pixels of the images.'class'
: Atf.Tensor
withdtype=tf.int64
and shape [1], corresponding to the class label.
Seven splits of iNaturalist datasets are available. The details of each different dataset split can be found in https://arxiv.org/abs/2003.08082 For the USER_120K dataset, the images are split by the user id. The number of clients for USER120K is 9,275. The training set contains 120,300 images of 1,203 species, and test set contains 35,641 images. For the GEO* datasets, the images are splitted by the geo location. The number of clients for the GEO_* datasets:
- GEO_100: 3607.
- GEO_300: 1209.
- GEO_1K: 369.
- GEO_3K: 136.
- GEO_10K: 39.
- GEO_30K: 12.
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
a tff.simulation.datasets.ClientData and a tf.data.Dataset .
|