- Description:
The TAO dataset is a large video object detection dataset consisting of 2,907 high resolution videos and 833 object categories. Note that this dataset requires at least 300 GB of free space to store.
Additional Documentation: Explore on Papers With Code
Homepage: https://taodataset.org/
Source code:
tfds.video.tao.Tao
Versions:
1.0.0
(default): Initial release.
Download size:
113.96 GiB
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Some TAO files (HVACS and AVA videos) must be manually downloaded because a login to MOT is required. Please download and those data following the instructions at https://motchallenge.net/tao_download.php
Download this data and move the resulting .zip files to ~/tensorflow_datasets/downloads/manual/
If the data requiring manual download is not present, it will be skipped over and only the data not requiring manual download will be used.
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
500 |
'validation' |
988 |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{Dave_2020,
title={TAO: A Large-Scale Benchmark for Tracking Any Object},
ISBN={9783030585587},
ISSN={1611-3349},
url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},
DOI={10.1007/978-3-030-58558-7_26},
journal={Lecture Notes in Computer Science},
publisher={Springer International Publishing},
author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},
year={2020},
pages={436-454}
}
tao/480_640 (default config)
Config description: All images are bilinearly resized to 480 X 640
Dataset size:
482.30 GiB
Feature structure:
FeaturesDict({
'metadata': FeaturesDict({
'dataset': string,
'height': int32,
'neg_category_ids': Tensor(shape=(None,), dtype=int32),
'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
'num_frames': int32,
'video_name': string,
'width': int32,
}),
'tracks': Sequence({
'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
'frames': Sequence(int32),
'is_crowd': bool,
'scale_category': string,
'track_id': int32,
}),
'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
metadata | FeaturesDict | |||
metadata/dataset | Tensor | string | ||
metadata/height | Tensor | int32 | ||
metadata/neg_category_ids | Tensor | (None,) | int32 | |
metadata/not_exhaustive_category_ids | Tensor | (None,) | int32 | |
metadata/num_frames | Tensor | int32 | ||
metadata/video_name | Tensor | string | ||
metadata/width | Tensor | int32 | ||
tracks | Sequence | |||
tracks/bboxes | Sequence(BBoxFeature) | (None, 4) | float32 | |
tracks/category | ClassLabel | int64 | ||
tracks/frames | Sequence(Tensor) | (None,) | int32 | |
tracks/is_crowd | Tensor | bool | ||
tracks/scale_category | Tensor | string | ||
tracks/track_id | Tensor | int32 | ||
video | Video(Image) | (None, 480, 640, 3) | uint8 |
- Examples (tfds.as_dataframe):
tao/full_resolution
Config description: The full resolution version of the dataset.
Dataset size:
171.24 GiB
Feature structure:
FeaturesDict({
'metadata': FeaturesDict({
'dataset': string,
'height': int32,
'neg_category_ids': Tensor(shape=(None,), dtype=int32),
'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
'num_frames': int32,
'video_name': string,
'width': int32,
}),
'tracks': Sequence({
'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
'frames': Sequence(int32),
'is_crowd': bool,
'scale_category': string,
'track_id': int32,
}),
'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
metadata | FeaturesDict | |||
metadata/dataset | Tensor | string | ||
metadata/height | Tensor | int32 | ||
metadata/neg_category_ids | Tensor | (None,) | int32 | |
metadata/not_exhaustive_category_ids | Tensor | (None,) | int32 | |
metadata/num_frames | Tensor | int32 | ||
metadata/video_name | Tensor | string | ||
metadata/width | Tensor | int32 | ||
tracks | Sequence | |||
tracks/bboxes | Sequence(BBoxFeature) | (None, 4) | float32 | |
tracks/category | ClassLabel | int64 | ||
tracks/frames | Sequence(Tensor) | (None,) | int32 | |
tracks/is_crowd | Tensor | bool | ||
tracks/scale_category | Tensor | string | ||
tracks/track_id | Tensor | int32 | ||
video | Video(Image) | (None, None, None, 3) | uint8 |
- Examples (tfds.as_dataframe):