Stay organized with collections
Save and categorize content based on your preferences.
Description:
VoxForge is a language classification dataset. It consists of user submitted
audio clips submitted to the website. In this release, data from 6 languages is
collected - English, Spanish, French, German, Russian, and Italian. Since the
website is constantly updated, and for the sake of reproducibility, this release
contains only recordings submitted prior to 2020-01-01. The samples are splitted
between train, validation and testing so that samples from each speaker belongs
to exactly one split.
Manual download instructions: This dataset requires you to
download the source data manually into download_config.manual_dir
(defaults to ~/tensorflow_datasets/downloads/manual/):
VoxForge requires manual download of the audio archives. The complete list of
archives can be found in https://storage.googleapis.com/tfds-data/downloads/voxforge/voxforge_urls.txt It can be downloaded using the following command:
wget -i voxforge_urls.txt -x
Note that downloading and building the dataset locally requires ~100GB disk
space (but only ~60GB will be used permanently).
@article{maclean2018voxforge,title={Voxforge},author={MacLean,Ken},journal={KenMacLean.[Online].Available:http://www.voxforge.org/home.[Acedido em 2012]},year={2018}}
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# voxforge\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nVoxForge is a language classification dataset. It consists of user submitted\naudio clips submitted to the website. In this release, data from 6 languages is\ncollected - English, Spanish, French, German, Russian, and Italian. Since the\nwebsite is constantly updated, and for the sake of reproducibility, this release\ncontains only recordings submitted prior to 2020-01-01. The samples are splitted\nbetween train, validation and testing so that samples from each speaker belongs\nto exactly one split.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/voxforge)\n\n- **Homepage** : \u003chttp://www.voxforge.org/\u003e\n\n- **Source code** :\n [`tfds.audio.Voxforge`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/voxforge.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `Unknown size`\n\n- **Dataset size** : `Unknown size`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n VoxForge requires manual download of the audio archives. The complete list of\n archives can be found in \u003chttps://storage.googleapis.com/tfds-data/downloads/voxforge/voxforge_urls.txt\u003e It can be downloaded using the following command:\n wget -i voxforge_urls.txt -x\n Note that downloading and building the dataset locally requires \\~100GB disk\n space (but only \\~60GB will be used permanently).\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Unknown\n\n- **Splits**:\n\n| Split | Examples |\n|-------|----------|\n\n- **Feature structure**:\n\n FeaturesDict({\n 'audio': Audio(shape=(None,), dtype=int64),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=6),\n 'speaker_id': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| audio | Audio | (None,) | int64 | |\n| label | ClassLabel | | int64 | |\n| speaker_id | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('audio', 'label')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n Missing.\n\n- **Citation**:\n\n @article{maclean2018voxforge,\n title={Voxforge},\n author={MacLean, Ken},\n journal={Ken MacLean.[Online]. Available: http://www.voxforge.org/home.[Acedido em 2012]},\n year={2018}\n }"]]