tao

  • Description:

The TAO dataset is a large video object detection dataset consisting of 2,907 high resolution videos and 833 object categories. Note that this dataset requires at least 300 GB of free space to store.

Download this data and move the resulting .zip files to ~/tensorflow_datasets/downloads/manual/

If the data requiring manual download is not present, it will be skipped over and only the data not requiring manual download will be used.

Split Examples
'train' 500
'validation' 988
@article{Dave_2020,
   title={TAO: A Large-Scale Benchmark for Tracking Any Object},
   ISBN={9783030585587},
   ISSN={1611-3349},
   url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},
   DOI={10.1007/978-3-030-58558-7_26},
   journal={Lecture Notes in Computer Science},
   publisher={Springer International Publishing},
   author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},
   year={2020},
   pages={436-454}
}

tao/480_640 (default config)

  • Config description: All images are bilinearly resized to 480 X 640

  • Dataset size: 482.30 GiB

  • Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
metadata FeaturesDict
metadata/dataset Tensor string
metadata/height Tensor int32
metadata/neg_category_ids Tensor (None,) int32
metadata/not_exhaustive_category_ids Tensor (None,) int32
metadata/num_frames Tensor int32
metadata/video_name Tensor string
metadata/width Tensor int32
tracks Sequence
tracks/bboxes Sequence(BBoxFeature) (None, 4) float32
tracks/category ClassLabel int64
tracks/frames Sequence(Tensor) (None,) int32
tracks/is_crowd Tensor bool
tracks/scale_category Tensor string
tracks/track_id Tensor int32
video Video(Image) (None, 480, 640, 3) uint8

tao/full_resolution

  • Config description: The full resolution version of the dataset.

  • Dataset size: 171.24 GiB

  • Feature structure:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
metadata FeaturesDict
metadata/dataset Tensor string
metadata/height Tensor int32
metadata/neg_category_ids Tensor (None,) int32
metadata/not_exhaustive_category_ids Tensor (None,) int32
metadata/num_frames Tensor int32
metadata/video_name Tensor string
metadata/width Tensor int32
tracks Sequence
tracks/bboxes Sequence(BBoxFeature) (None, 4) float32
tracks/category ClassLabel int64
tracks/frames Sequence(Tensor) (None,) int32
tracks/is_crowd Tensor bool
tracks/scale_category Tensor string
tracks/track_id Tensor int32
video Video(Image) (None, None, None, 3) uint8