• Description:

Kitti contains a suite of vision tasks built using an autonomous driving platform. The full benchmark contains many tasks such as stereo, optical flow, visual odometry, etc. This dataset contains the object detection dataset, including the monocular images and bounding boxes. The dataset contains 7481 training images annotated with 3D bounding boxes. A full description of the annotations can be found in the readme of the object development kit readme on the Kitti homepage.

Split Examples
'test' 711
'train' 6,347
'validation' 423
  • Feature structure:
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'image/file_name': Text(shape=(), dtype=string),
    'objects': Sequence({
        'alpha': float32,
        'bbox': BBoxFeature(shape=(4,), dtype=float32),
        'dimensions': Tensor(shape=(3,), dtype=float32),
        'location': Tensor(shape=(3,), dtype=float32),
        'occluded': ClassLabel(shape=(), dtype=int64, num_classes=4),
        'rotation_y': float32,
        'truncated': float32,
        'type': ClassLabel(shape=(), dtype=int64, num_classes=8),
  • Feature documentation:
Feature Class Shape Dtype Description
image Image (None, None, 3) uint8
image/file_name Text string
objects Sequence
objects/alpha Tensor float32 Observation angle of object, ranging [-pi..pi]
objects/bbox BBoxFeature (4,) float32 2D bounding box of object in the image
objects/dimensions Tensor (3,) float32 3D object dimensions: height, width, length (in meters)
objects/location Tensor (3,) float32 3D object location x,y,z in camera coordinates (in meters)
objects/occluded ClassLabel int64 Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded2 = largely occluded, 3 = unknown
objects/rotation_y Tensor float32 Rotation ry around Y-axis in camera coordinates [-pi..pi]
objects/truncated Tensor float32 Float from 0 (non-truncated) to 1 (truncated), wheretruncated refers to the object leaving image boundaries
objects/type ClassLabel int64 The type of object, e.g. 'Car' or 'Van'


  • Citation:
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}