• Description:

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

Note that the 'p315' text was lost due to a hard disk error.

    'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),
    'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'id': string,
    'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),
    'speech': Audio(shape=(None,), dtype=int16),
    'text': Text(shape=(), dtype=string),
  • Feature documentation:
Feature Class Shape Dtype Description
accent ClassLabel int64
gender ClassLabel int64
id Tensor string
speaker ClassLabel int64
speech Audio (None,) int16
text Text string
  author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
  title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
  publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},

vctk/mic1 (default config)

  • Config description: Audio recorded using an omni-directional microphone (DPA 4035). Contains very low frequency noises.

          This is the same audio released in previous versions of VCTK:

  • Dataset size: 39.87 GiB

  • Splits:

Split Examples
'train' 44,455


  • Config description: Audio recorded using a small diaphragm condenser microphone with very wide bandwidth (Sennheiser MKH 800).

          Two speakers, p280 and p315 had technical issues of the audio
          recordings using MKH 800.
  • Dataset size: 38.86 GiB

  • Splits:

Split Examples
'train' 43,873