coco_captions
Stay organized with collections
Save and categorize content based on your preferences.
COCO is a large-scale object detection, segmentation, and captioning dataset.
This version contains images, bounding boxes, labels, and captions from COCO
2014, split into the subsets defined by Karpathy and Li (2015). This effectively
divides the original COCO 2014 validation data into new 5000-image validation
and test sets, plus a "restval" set containing the remaining ~30k images. All
splits have caption annotations.
Split |
Examples |
'restval' |
30,504 |
'test' |
5,000 |
'train' |
82,783 |
'val' |
5,000 |
FeaturesDict({
'captions': Sequence({
'id': int64,
'text': string,
}),
'image': Image(shape=(None, None, 3), dtype=uint8),
'image/filename': Text(shape=(), dtype=string),
'image/id': int64,
'objects': Sequence({
'area': int64,
'bbox': BBoxFeature(shape=(4,), dtype=float32),
'id': int64,
'is_crowd': bool,
'label': ClassLabel(shape=(), dtype=int64, num_classes=80),
}),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
captions |
Sequence |
|
|
|
captions/id |
Tensor |
|
int64 |
|
captions/text |
Tensor |
|
string |
|
image |
Image |
(None, None, 3) |
uint8 |
|
image/filename |
Text |
|
string |
|
image/id |
Tensor |
|
int64 |
|
objects |
Sequence |
|
|
|
objects/area |
Tensor |
|
int64 |
|
objects/bbox |
BBoxFeature |
(4,) |
float32 |
|
objects/id |
Tensor |
|
int64 |
|
objects/is_crowd |
Tensor |
|
bool |
|
objects/label |
ClassLabel |
|
int64 |
|

@article{DBLP:journals/corr/LinMBHPRDZ14,
author = {Tsung{-}Yi Lin and
Michael Maire and
Serge J. Belongie and
Lubomir D. Bourdev and
Ross B. Girshick and
James Hays and
Pietro Perona and
Deva Ramanan and
Piotr Doll{'{a} }r and
C. Lawrence Zitnick},
title = {Microsoft {COCO:} Common Objects in Context},
journal = {CoRR},
volume = {abs/1405.0312},
year = {2014},
url = {http://arxiv.org/abs/1405.0312},
archivePrefix = {arXiv},
eprint = {1405.0312},
timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
bibsource = {dblp computer science bibliography, https://dblp.org}
}@inproceedings{DBLP:conf/cvpr/KarpathyL15,
author = {Andrej Karpathy and
Fei{-}Fei Li},
title = {Deep visual-semantic alignments for generating image
descriptions},
booktitle = { {IEEE} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2015, Boston, MA, USA, June 7-12, 2015},
pages = {3128--3137},
publisher = { {IEEE} Computer Society},
year = {2015},
url = {https://doi.org/10.1109/CVPR.2015.7298932},
doi = {10.1109/CVPR.2015.7298932},
timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
biburl = {https://dblp.org/rec/conf/cvpr/KarpathyL15.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
coco_captions/2014 (default config)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-06-01 UTC.
[null,null,["Last updated 2024-06-01 UTC."],[],[],null,["# coco_captions\n\n\u003cbr /\u003e\n\n- **Description**:\n\nCOCO is a large-scale object detection, segmentation, and captioning dataset.\nThis version contains images, bounding boxes, labels, and captions from COCO\n2014, split into the subsets defined by Karpathy and Li (2015). This effectively\ndivides the original COCO 2014 validation data into new 5000-image validation\nand test sets, plus a \"restval\" set containing the remaining \\~30k images. All\nsplits have caption annotations.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/coco-captions)\n\n- **Config description**: This version contains images, bounding boxes and\n labels for the 2014 version.\n\n- **Homepage** : \u003chttp://cocodataset.org/#home\u003e\n\n- **Source code** :\n [`tfds.object_detection.CocoCaptions`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/object_detection/coco_captions.py)\n\n- **Versions**:\n\n - **`1.1.0`** (default): No release notes.\n- **Download size** : `37.61 GiB`\n\n- **Dataset size** : `18.83 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|-------------|----------|\n| `'restval'` | 30,504 |\n| `'test'` | 5,000 |\n| `'train'` | 82,783 |\n| `'val'` | 5,000 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'captions': Sequence({\n 'id': int64,\n 'text': string,\n }),\n 'image': Image(shape=(None, None, 3), dtype=uint8),\n 'image/filename': Text(shape=(), dtype=string),\n 'image/id': int64,\n 'objects': Sequence({\n 'area': int64,\n 'bbox': BBoxFeature(shape=(4,), dtype=float32),\n 'id': int64,\n 'is_crowd': bool,\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=80),\n }),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------------|--------------|-----------------|---------|-------------|\n| | FeaturesDict | | | |\n| captions | Sequence | | | |\n| captions/id | Tensor | | int64 | |\n| captions/text | Tensor | | string | |\n| image | Image | (None, None, 3) | uint8 | |\n| image/filename | Text | | string | |\n| image/id | Tensor | | int64 | |\n| objects | Sequence | | | |\n| objects/area | Tensor | | int64 | |\n| objects/bbox | BBoxFeature | (4,) | float32 | |\n| objects/id | Tensor | | int64 | |\n| objects/is_crowd | Tensor | | bool | |\n| objects/label | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `None`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{DBLP:journals/corr/LinMBHPRDZ14,\n author = {Tsung{-}Yi Lin and\n Michael Maire and\n Serge J. Belongie and\n Lubomir D. Bourdev and\n Ross B. Girshick and\n James Hays and\n Pietro Perona and\n Deva Ramanan and\n Piotr Doll{'{a} }r and\n C. Lawrence Zitnick},\n title = {Microsoft {COCO:} Common Objects in Context},\n journal = {CoRR},\n volume = {abs/1405.0312},\n year = {2014},\n url = {http://arxiv.org/abs/1405.0312},\n archivePrefix = {arXiv},\n eprint = {1405.0312},\n timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},\n biburl = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }@inproceedings{DBLP:conf/cvpr/KarpathyL15,\n author = {Andrej Karpathy and\n Fei{-}Fei Li},\n title = {Deep visual-semantic alignments for generating image\n descriptions},\n booktitle = { {IEEE} Conference on Computer Vision and Pattern Recognition,\n {CVPR} 2015, Boston, MA, USA, June 7-12, 2015},\n pages = {3128--3137},\n publisher = { {IEEE} Computer Society},\n year = {2015},\n url = {https://doi.org/10.1109/CVPR.2015.7298932},\n doi = {10.1109/CVPR.2015.7298932},\n timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},\n biburl = {https://dblp.org/rec/conf/cvpr/KarpathyL15.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }\n\ncoco_captions/2014 (default config)\n-----------------------------------"]]