• Description:

The Google RefExp dataset is a collection of text descriptions of objects in images which builds on the publicly available MS-COCO dataset. Whereas the image captions in MS-COCO apply to the entire image, this dataset focuses on text descriptions that allow one to uniquely identify a single object or region within an image. See more details in this paper: Generation and Comprehension of Unambiguous Object Descriptions.

The coco_train2014 folder contains all of COCO 2014 training images.

Split Examples
'train' 24,698
'validation' 4,650
  • Feature structure:
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'image/id': int64,
    'objects': Sequence({
        'area': int64,
        'bbox': BBoxFeature(shape=(4,), dtype=float32),
        'id': int64,
        'label': int64,
        'label_name': ClassLabel(shape=(), dtype=int64, num_classes=80),
        'refexp': Sequence({
            'raw': Text(shape=(), dtype=string),
            'referent': Text(shape=(), dtype=string),
            'refexp_id': int64,
            'tokens': Sequence(Text(shape=(), dtype=string)),
  • Feature documentation:
Feature Class Shape Dtype Description
image Image (None, None, 3) uint8
image/id Tensor int64
objects Sequence
objects/area Tensor int64
objects/bbox BBoxFeature (4,) float32
objects/id Tensor int64
objects/label Tensor int64
objects/label_name ClassLabel int64
objects/refexp Sequence
objects/refexp/raw Text string
objects/refexp/referent Text string
objects/refexp/refexp_id Tensor int64
objects/refexp/tokens Sequence(Text) (None,) string


  • Citation:
  title={Generation and Comprehension of Unambiguous Object Descriptions},
  author={Mao, Junhua and Huang, Jonathan and Toshev, Alexander and Camburu, Oana and Yuille, Alan and Murphy, Kevin},