TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

kitti

Description:

Kitti contains a suite of vision tasks built using an autonomous driving platform. The full benchmark contains many tasks such as stereo, optical flow, visual odometry, etc. This dataset contains the object detection dataset, including the monocular images and bounding boxes. The dataset contains 7481 training images annotated with 3D bounding boxes. A full description of the annotations can be found in the readme of the object development kit readme on the Kitti homepage.

Additional Documentation: Explore on Papers With Code
Homepage: http://www.cvlibs.net/datasets/kitti/
Source code: tfds.datasets.kitti.Builder
Versions:
- 3.1.0: No release notes.
- 3.2.0: Devkit updated.
- 3.3.0 (default): Added labels for the occluded feature.
Download size: 11.71 GiB
Dataset size: 5.27 GiB
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	711
`'train'`	6,347
`'validation'`	423

Feature structure:

FeaturesDict({
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'image/file_name': Text(shape=(), dtype=string),
    'objects': Sequence({
        'alpha': float32,
        'bbox': BBoxFeature(shape=(4,), dtype=float32, description=2D bounding box of object in the image),
        'dimensions': Tensor(shape=(3,), dtype=float32, description=3D object dimensions: height, width, length (in meters)),
        'location': Tensor(shape=(3,), dtype=float32, description=3D object location x,y,z in camera coordinates (in meters)),
        'occluded': ClassLabel(shape=(), dtype=int64, num_classes=4),
        'rotation_y': float32,
        'truncated': float32,
        'type': ClassLabel(shape=(), dtype=int64, num_classes=8),
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype	Description
	FeaturesDict
image	Image	(None, None, 3)	uint8
image/file_name	Text		string
objects	Sequence
objects/alpha	Tensor		float32	Observation angle of object, ranging [-pi..pi]
objects/bbox	BBoxFeature	(4,)	float32	2D bounding box of object in the image
objects/dimensions	Tensor	(3,)	float32	3D object dimensions: height, width, length (in meters)
objects/location	Tensor	(3,)	float32	3D object location x,y,z in camera coordinates (in meters)
objects/occluded	ClassLabel		int64	Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded2 = largely occluded, 3 = unknown
objects/rotation_y	Tensor		float32	Rotation ry around Y-axis in camera coordinates [-pi..pi]
objects/truncated	Tensor		float32	Float from 0 (non-truncated) to 1 (truncated), wheretruncated refers to the object leaving image boundaries
objects/type	ClassLabel		int64	The type of object, e.g. 'Car' or 'Van'

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples):

Visualization

Examples (tfds.as_dataframe):

Citation:

@inproceedings{Geiger2012CVPR,
  author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
  title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2012}
}