vctk
Stay organized with collections
Save and categorize content based on your preferences.
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with
various accents. Each speaker reads out about 400 sentences, which were selected
from a newspaper, the rainbow passage and an elicitation paragraph used for the
speech accent archive.
Note that the 'p315' text was lost due to a hard disk error.
FeaturesDict({
'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),
'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),
'id': string,
'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),
'speech': Audio(shape=(None,), dtype=int16),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
accent |
ClassLabel |
|
int64 |
|
gender |
ClassLabel |
|
int64 |
|
id |
Tensor |
|
string |
|
speaker |
ClassLabel |
|
int64 |
|
speech |
Audio |
(None,) |
int16 |
|
text |
Text |
|
string |
|
@misc{yamagishi2019vctk,
author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},
year=2019,
doi={10.7488/ds/2645},
}
vctk/mic1 (default config)
Split |
Examples |
'train' |
44,455 |
vctk/mic2
Split |
Examples |
'train' |
43,873 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# vctk\n\n\u003cbr /\u003e\n\n- **Description**:\n\nThis CSTR VCTK Corpus includes speech data uttered by 110 English speakers with\nvarious accents. Each speaker reads out about 400 sentences, which were selected\nfrom a newspaper, the rainbow passage and an elicitation paragraph used for the\nspeech accent archive.\n\nNote that the 'p315' text was lost due to a hard disk error.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/vctk)\n\n- **Homepage** :\n \u003chttps://doi.org/10.7488/ds/2645\u003e\n\n- **Source code** :\n [`tfds.audio.Vctk`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/vctk.py)\n\n- **Versions**:\n\n - `1.0.0`: VCTK release 0.92.0.\n - **`1.0.1`** (default): Fix speech data type with dtype=tf.int16.\n- **Download size** : `10.94 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),\n 'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),\n 'id': string,\n 'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),\n 'speech': Audio(shape=(None,), dtype=int16),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|---------|--------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| accent | ClassLabel | | int64 | |\n| gender | ClassLabel | | int64 | |\n| id | Tensor | | string | |\n| speaker | ClassLabel | | int64 | |\n| speech | Audio | (None,) | int16 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('text', 'speech')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @misc{yamagishi2019vctk,\n author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},\n title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},\n publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},\n year=2019,\n doi={10.7488/ds/2645},\n }\n\nvctk/mic1 (default config)\n--------------------------\n\n- **Config description**: Audio recorded using an omni-directional microphone\n (DPA 4035). Contains very low frequency noises.\n\n This is the same audio released in previous versions of VCTK:\n https://doi.org/10.7488/ds/1994\n\n- **Dataset size** : `39.87 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 44,455 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nvctk/mic2\n---------\n\n- **Config description**: Audio recorded using a small diaphragm condenser\n microphone with very wide bandwidth (Sennheiser MKH 800).\n\n Two speakers, p280 and p315 had technical issues of the audio\n recordings using MKH 800.\n\n- **Dataset size** : `38.86 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 43,873 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]