TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

voxceleb

Description:

An large scale dataset for speaker identification. This data is collected from over 1,251 speakers, with over 150k samples in total. This release contains the audio part of the voxceleb1.1 dataset.

Additional Documentation: Explore on Papers With Code
Homepage: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
Source code: tfds.audio.Voxceleb
Versions:
- 1.2.1 (default): Add youtube_id field
Download size: 4.68 MiB
Dataset size: 107.98 GiB
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
manual_dir should contain the file vox_dev_wav.zip. The instructions for downloading this file are found in http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html This dataset requires registration.
Auto-cached (documentation): No
Splits:

Split	Examples
`'test'`	7,972
`'train'`	134,000
`'validation'`	6,670

Feature structure:

FeaturesDict({
    'audio': Audio(shape=(None,), dtype=int64),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=1252),
    'youtube_id': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
audio	Audio	(None,)	int64
label	ClassLabel		int64
youtube_id	Text		string

Supervised keys (See as_supervised doc): ('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@InProceedings{Nagrani17,
    author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
    title        = "VoxCeleb: a large-scale speaker identification dataset",
    booktitle    = "INTERSPEECH",
    year         = "2017",
}