coco_captions

  • Keterangan :

COCO adalah kumpulan data pendeteksian objek, segmentasi, dan pembuatan teks berskala besar. Versi ini berisi gambar, kotak pembatas, label, dan keterangan dari COCO 2014, dibagi menjadi beberapa subset yang ditentukan oleh Karpathy dan Li (2015). Ini secara efektif membagi data validasi COCO 2014 asli menjadi 5.000 gambar validasi dan set pengujian baru, ditambah set "restval" yang berisi ~30 ribu gambar tersisa. Semua pemisahan memiliki anotasi teks.

Membelah Contoh
'restval' 30.504
'test' 5.000
'train' 82.783
'val' 5.000
  • Struktur fitur :
FeaturesDict({
    'captions': Sequence({
        'id': int64,
        'text': string,
    }),
    'image': Image(shape=(None, None, 3), dtype=uint8),
    'image/filename': Text(shape=(), dtype=string),
    'image/id': int64,
    'objects': Sequence({
        'area': int64,
        'bbox': BBoxFeature(shape=(4,), dtype=float32),
        'id': int64,
        'is_crowd': bool,
        'label': ClassLabel(shape=(), dtype=int64, num_classes=80),
    }),
})
  • Dokumentasi fitur :
Fitur Kelas Membentuk Tipe D Keterangan
FiturDict
keterangan Urutan
keterangan/id Tensor int64
keterangan/teks Tensor rangkaian
gambar Gambar (Tidak ada, Tidak ada, 3) uint8
gambar/nama file Teks rangkaian
gambar/id Tensor int64
objek Urutan
benda/daerah Tensor int64
objek/bbox Fitur BBox (4,) float32
objek/id Tensor int64
objek/is_crowd Tensor bodoh
benda/label Label Kelas int64

Visualisasi

  • Kutipan :
@article{DBLP:journals/corr/LinMBHPRDZ14,
  author    = {Tsung{-}Yi Lin and
               Michael Maire and
               Serge J. Belongie and
               Lubomir D. Bourdev and
               Ross B. Girshick and
               James Hays and
               Pietro Perona and
               Deva Ramanan and
               Piotr Doll{'{a} }r and
               C. Lawrence Zitnick},
  title     = {Microsoft {COCO:} Common Objects in Context},
  journal   = {CoRR},
  volume    = {abs/1405.0312},
  year      = {2014},
  url       = {http://arxiv.org/abs/1405.0312},
  archivePrefix = {arXiv},
  eprint    = {1405.0312},
  timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}@inproceedings{DBLP:conf/cvpr/KarpathyL15,
  author    = {Andrej Karpathy and
               Fei{-}Fei Li},
  title     = {Deep visual-semantic alignments for generating image
               descriptions},
  booktitle = { {IEEE} Conference on Computer Vision and Pattern Recognition,
               {CVPR} 2015, Boston, MA, USA, June 7-12, 2015},
  pages     = {3128--3137},
  publisher = { {IEEE} Computer Society},
  year      = {2015},
  url       = {https://doi.org/10.1109/CVPR.2015.7298932},
  doi       = {10.1109/CVPR.2015.7298932},
  timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
  biburl    = {https://dblp.org/rec/conf/cvpr/KarpathyL15.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

coco_captions/2014 (konfigurasi default)