- Description:
A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.
RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.
Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".
Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):
dataset | partition | split | refs | images |
---|---|---|---|---|
refcoco | train | 40000 | 19213 | |
refcoco | val | 5000 | 4559 | |
refcoco | test | 5000 | 4527 | |
refcoco | unc | train | 42404 | 16994 |
refcoco | unc | val | 3811 | 1500 |
refcoco | unc | testA | 1975 | 750 |
refcoco | unc | testB | 1810 | 750 |
refcoco+ | unc | train | 42278 | 16992 |
refcoco+ | unc | val | 3805 | 1500 |
refcoco+ | unc | testA | 1975 | 750 |
refcoco+ | unc | testB | 1798 | 750 |
refcocog | train | 44822 | 24698 | |
refcocog | val | 5000 | 4650 | |
refcocog | umd | train | 42226 | 21899 |
refcocog | umd | val | 2573 | 1300 |
refcocog | umd | test | 5023 | 2600 |
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/lichengunc/refer
Source code:
tfds.datasets.ref_coco.Builder
Versions:
1.0.0
: Initial release.1.1.0
(default): Added masks.
Download size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):Follow the instructions in https://github.com/lichengunc/refer and download the annotations and the images, matching the data/ directory specified in the repo.
Follow the instructions of PythonAPI in https://github.com/cocodataset/cocoapi to get pycocotools and the instances_train2014 annotations file from https://cocodataset.org/#download
Add both refer.py from (1) and pycocotools from (2) to your PYTHONPATH.
Run manual_download_process.py to generate refcoco.json, replacing
ref_data_root
,coco_annotations_file
, andout_file
with the values corresponding to where you have downloaded / want to save these files. Note that manual_download_process.py can be found in the TFDS repository.Download the COCO training set from https://cocodataset.org/#download and stick it into a folder called
coco_train2014/
. Moverefcoco.json
to the same level ascoco_train2014
.Follow the standard manual download instructions.
Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'coco_annotations': Sequence({
'area': int64,
'bbox': BBoxFeature(shape=(4,), dtype=float32),
'id': int64,
'label': int64,
}),
'image': Image(shape=(None, None, 3), dtype=uint8),
'image/id': int64,
'objects': Sequence({
'area': int64,
'bbox': BBoxFeature(shape=(4,), dtype=float32),
'gt_box_index': int64,
'id': int64,
'label': int64,
'mask': Image(shape=(None, None, 3), dtype=uint8),
'refexp': Sequence({
'raw': Text(shape=(), dtype=string),
'refexp_id': int64,
}),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
coco_annotations | Sequence | |||
coco_annotations/area | Tensor | int64 | ||
coco_annotations/bbox | BBoxFeature | (4,) | float32 | |
coco_annotations/id | Tensor | int64 | ||
coco_annotations/label | Tensor | int64 | ||
image | Image | (None, None, 3) | uint8 | |
image/id | Tensor | int64 | ||
objects | Sequence | |||
objects/area | Tensor | int64 | ||
objects/bbox | BBoxFeature | (4,) | float32 | |
objects/gt_box_index | Tensor | int64 | ||
objects/id | Tensor | int64 | ||
objects/label | Tensor | int64 | ||
objects/mask | Image | (None, None, 3) | uint8 | |
objects/refexp | Sequence | |||
objects/refexp/raw | Text | string | ||
objects/refexp/refexp_id | Tensor | int64 |
Supervised keys (See
as_supervised
doc):None
Citation:
@inproceedings{kazemzadeh2014referitgame,
title={Referitgame: Referring to objects in photographs of natural scenes},
author={Kazemzadeh, Sahar and Ordonez, Vicente and Matten, Mark and Berg, Tamara},
booktitle={Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)},
pages={787--798},
year={2014}
}
@inproceedings{yu2016modeling,
title={Modeling context in referring expressions},
author={Yu, Licheng and Poirson, Patrick and Yang, Shan and Berg, Alexander C and Berg, Tamara L},
booktitle={European Conference on Computer Vision},
pages={69--85},
year={2016},
organization={Springer}
}
@inproceedings{mao2016generation,
title={Generation and Comprehension of Unambiguous Object Descriptions},
author={Mao, Junhua and Huang, Jonathan and Toshev, Alexander and Camburu, Oana and Yuille, Alan and Murphy, Kevin},
booktitle={CVPR},
year={2016}
}
@inproceedings{nagaraja2016modeling,
title={Modeling context between objects for referring expression understanding},
author={Nagaraja, Varun K and Morariu, Vlad I and Davis, Larry S},
booktitle={European Conference on Computer Vision},
pages={792--807},
year={2016},
organization={Springer}
}
ref_coco/refcoco_unc (default config)
Dataset size:
3.29 GiB
Splits:
Split | Examples |
---|---|
'testA' |
750 |
'testB' |
750 |
'train' |
16,994 |
'validation' |
1,500 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
ref_coco/refcoco_google
Dataset size:
4.65 GiB
Splits:
Split | Examples |
---|---|
'test' |
4,527 |
'train' |
19,213 |
'validation' |
4,559 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
ref_coco/refcocoplus_unc
Dataset size:
3.29 GiB
Splits:
Split | Examples |
---|---|
'testA' |
750 |
'testB' |
750 |
'train' |
16,992 |
'validation' |
1,500 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
ref_coco/refcocog_google
Dataset size:
4.64 GiB
Splits:
Split | Examples |
---|---|
'train' |
24,698 |
'validation' |
4,650 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
ref_coco/refcocog_umd
Dataset size:
4.08 GiB
Splits:
Split | Examples |
---|---|
'test' |
2,600 |
'train' |
21,899 |
'validation' |
1,300 |
- Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):