• Description:

An large scale dataset for speaker identification. This data is collected from over 1,251 speakers, with over 150k samples in total. This release contains the audio part of the voxceleb1.1 dataset.

Split Examples
'test' 7,972
'train' 134,000
'validation' 6,670
  • Feature structure:
    'audio': Audio(shape=(None,), dtype=int64),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=1252),
    'youtube_id': Text(shape=(), dtype=string),
  • Feature documentation:
Feature Class Shape Dtype Description
audio Audio (None,) int64
label ClassLabel int64
youtube_id Text string
  • Citation:
    author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
    title        = "VoxCeleb: a large-scale speaker identification dataset",
    booktitle    = "INTERSPEECH",
    year         = "2017",