- Description:
Kitti contains a suite of vision tasks built using an autonomous driving platform. The full benchmark contains many tasks such as stereo, optical flow, visual odometry, etc. This dataset contains the object detection dataset, including the monocular images and bounding boxes. The dataset contains 7481 training images annotated with 3D bounding boxes. A full description of the annotations can be found in the readme of the object development kit readme on the Kitti homepage.
Additional Documentation: Explore on Papers With Code
Homepage: http://www.cvlibs.net/datasets/kitti/
Source code:
tfds.datasets.kitti.Builder
Versions:
3.1.0
: No release notes.3.2.0
: Devkit updated.3.3.0
(default): Added labels for theoccluded
feature.
Download size:
11.71 GiB
Dataset size:
5.27 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
711 |
'train' |
6,347 |
'validation' |
423 |
- Feature structure:
FeaturesDict({
'image': Image(shape=(None, None, 3), dtype=uint8),
'image/file_name': Text(shape=(), dtype=string),
'objects': Sequence({
'alpha': float32,
'bbox': BBoxFeature(shape=(4,), dtype=float32, description=2D bounding box of object in the image),
'dimensions': Tensor(shape=(3,), dtype=float32, description=3D object dimensions: height, width, length (in meters)),
'location': Tensor(shape=(3,), dtype=float32, description=3D object location x,y,z in camera coordinates (in meters)),
'occluded': ClassLabel(shape=(), dtype=int64, num_classes=4),
'rotation_y': float32,
'truncated': float32,
'type': ClassLabel(shape=(), dtype=int64, num_classes=8),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
image | Image | (None, None, 3) | uint8 | |
image/file_name | Text | string | ||
objects | Sequence | |||
objects/alpha | Tensor | float32 | Observation angle of object, ranging [-pi..pi] | |
objects/bbox | BBoxFeature | (4,) | float32 | 2D bounding box of object in the image |
objects/dimensions | Tensor | (3,) | float32 | 3D object dimensions: height, width, length (in meters) |
objects/location | Tensor | (3,) | float32 | 3D object location x,y,z in camera coordinates (in meters) |
objects/occluded | ClassLabel | int64 | Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded2 = largely occluded, 3 = unknown | |
objects/rotation_y | Tensor | float32 | Rotation ry around Y-axis in camera coordinates [-pi..pi] | |
objects/truncated | Tensor | float32 | Float from 0 (non-truncated) to 1 (truncated), wheretruncated refers to the object leaving image boundaries | |
objects/type | ClassLabel | int64 | The type of object, e.g. 'Car' or 'Van' |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
- Citation:
@inproceedings{Geiger2012CVPR,
author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2012}
}