tao

  • Description:

The TAO dataset is a large video object detection dataset consisting of 2,907 high resolution videos and 833 object categories. Note that this dataset requires at least 300 GB of free space to store.

  • Additional Documentation: Explore on Papers With Code

  • Homepage: https://taodataset.org/

  • Source code: tfds.video.tao.Tao

  • Versions:

    • 1.0.0 (default): No release notes.
    • 1.1.0: Added test split.
  • Download size: 113.96 GiB

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    Some TAO files (HVACS and AVA videos) must be manually downloaded because a login to MOT is required. Please download and those data following the instructions at https://motchallenge.net/tao_download.php

Download this data and move the resulting .zip files to ~/tensorflow_datasets/downloads/manual/

If the data requiring manual download is not present, it will be skipped over and only the data not requiring manual download will be used.

Split Examples
'train' 500
'validation' 988
@article{Dave_2020,
   title={TAO: A Large-Scale Benchmark for Tracking Any Object},
   ISBN={9783030585587},
   ISSN={1611-3349},
   url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},
   DOI={10.1007/978-3-030-58558-7_26},
   journal={Lecture Notes in Computer Science},
   publisher={Springer International Publishing},
   author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},
   year={2020},
   pages={436-454}
}

tao/480_640 (default config)

  • Config description: All images are bilinearly resized to 480 X 640

  • Dataset size: 482.30 GiB

  • Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
metadata FeaturesDict
metadata/dataset Tensor string
metadata/height Tensor int32
metadata/neg_category_ids Tensor (None,) int32
metadata/not_exhaustive_category_ids Tensor (None,) int32
metadata/num_frames Tensor int32
metadata/video_name Tensor string
metadata/width Tensor int32
tracks Sequence
tracks/bboxes Sequence(BBoxFeature) (None, 4) float32
tracks/category ClassLabel int64
tracks/frames Sequence(Tensor) (None,) int32
tracks/is_crowd Tensor bool
tracks/scale_category Tensor string
tracks/track_id Tensor int32
video Video(Image) (None, 480, 640, 3) uint8

tao/full_resolution

  • Config description: The full resolution version of the dataset.

  • Dataset size: 171.24 GiB

  • Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
metadata FeaturesDict
metadata/dataset Tensor string
metadata/height Tensor int32
metadata/neg_category_ids Tensor (None,) int32
metadata/not_exhaustive_category_ids Tensor (None,) int32
metadata/num_frames Tensor int32
metadata/video_name Tensor string
metadata/width Tensor int32
tracks Sequence
tracks/bboxes Sequence(BBoxFeature) (None, 4) float32
tracks/category ClassLabel int64
tracks/frames Sequence(Tensor) (None,) int32
tracks/is_crowd Tensor bool
tracks/scale_category Tensor string
tracks/track_id Tensor int32
video Video(Image) (None, None, None, 3) uint8