TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

tao

Description:

The TAO dataset is a large video object detection dataset consisting of 2,907 high resolution videos and 833 object categories. Note that this dataset requires at least 300 GB of free space to store.

Additional Documentation: Explore on Papers With Code
Homepage: https://taodataset.org/
Source code: tfds.video.tao.Tao
Versions:
- 1.1.0 (default) : Added test split.
Download size: Unknown size
Dataset size: Unknown size
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Some TAO files (HVACS and AVA videos) must be manually downloaded because a login to MOT is required. Please download and those data following the instructions at https://motchallenge.net/tao_download.php

Download this data and move the resulting .zip files to ~/tensorflow_datasets/downloads/manual/

If the data requiring manual download is not present, it will be skipped over and only the data not requiring manual download will be used.

Auto-cached (documentation): Unknown
Splits:

Split	Examples

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:

@article{Dave_2020,
   title={TAO: A Large-Scale Benchmark for Tracking Any Object},
   ISBN={9783030585587},
   ISSN={1611-3349},
   url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},
   DOI={10.1007/978-3-030-58558-7_26},
   journal={Lecture Notes in Computer Science},
   publisher={Springer International Publishing},
   author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},
   year={2020},
   pages={436-454}
}

tao/480_640 (default config)

Config description: All images are bilinearly resized to 480 X 640
Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
metadata	FeaturesDict
metadata/dataset	Tensor		string
metadata/height	Tensor		int32
metadata/neg_category_ids	Tensor	(None,)	int32
metadata/not_exhaustive_category_ids	Tensor	(None,)	int32
metadata/num_frames	Tensor		int32
metadata/video_name	Tensor		string
metadata/width	Tensor		int32
tracks	Sequence
tracks/bboxes	Sequence(BBoxFeature)	(None, 4)	float32
tracks/category	ClassLabel		int64
tracks/frames	Sequence(Tensor)	(None,)	int32
tracks/is_crowd	Tensor		bool
tracks/scale_category	Tensor		string
tracks/track_id	Tensor		int32
video	Video(Image)	(None, 480, 640, 3)	uint8

tao/full_resolution

Config description: The full resolution version of the dataset.
Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
metadata	FeaturesDict
metadata/dataset	Tensor		string
metadata/height	Tensor		int32
metadata/neg_category_ids	Tensor	(None,)	int32
metadata/not_exhaustive_category_ids	Tensor	(None,)	int32
metadata/num_frames	Tensor		int32
metadata/video_name	Tensor		string
metadata/width	Tensor		int32
tracks	Sequence
tracks/bboxes	Sequence(BBoxFeature)	(None, 4)	float32
tracks/category	ClassLabel		int64
tracks/frames	Sequence(Tensor)	(None,)	int32
tracks/is_crowd	Tensor		bool
tracks/scale_category	Tensor		string
tracks/track_id	Tensor		int32
video	Video(Image)	(None, None, None, 3)	uint8