spoken_digit
Stay organized with collections
Save and categorize content based on your preferences.
A free audio dataset of spoken digits. Think MNIST for audio.
A simple audio/speech dataset consisting of recordings of spoken digits in wav
files at 8kHz. The recordings are trimmed so that they have near minimal silence
at the beginnings and ends.
5 speakers
2,500 recordings (50 of each digit per speaker)
English pronunciations
Files are named in the following format: {digitLabel}{speakerName}{index}.wav
Split |
Examples |
'train' |
2,500 |
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'audio/filename': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
audio |
Audio |
(None,) |
int64 |
|
audio/filename |
Text |
|
string |
|
label |
ClassLabel |
|
int64 |
|
@ONLINE {Free Spoken Digit Dataset,
author = "Zohar Jackson",
title = "Spoken_Digit",
year = "2016",
url = "https://github.com/Jakobovski/free-spoken-digit-dataset"
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-01-13 UTC.
[null,null,["Last updated 2023-01-13 UTC."],[],[],null,["# spoken_digit\n\n\u003cbr /\u003e\n\n- **Description**:\n\nA free audio dataset of spoken digits. Think MNIST for audio.\n\nA simple audio/speech dataset consisting of recordings of spoken digits in wav\nfiles at 8kHz. The recordings are trimmed so that they have near minimal silence\nat the beginnings and ends.\n\n5 speakers \n\n2,500 recordings (50 of each digit per speaker) \n\nEnglish pronunciations\n\nFiles are named in the following format: {digitLabel}*{speakerName}*{index}.wav\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/fsdd)\n\n- **Homepage** :\n \u003chttps://github.com/Jakobovski/free-spoken-digit-dataset\u003e\n\n- **Source code** :\n [`tfds.datasets.spoken_digit.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/spoken_digit/spoken_digit_dataset_builder.py)\n\n- **Versions**:\n\n - **`1.0.9`** (default): No release notes.\n- **Download size** : `11.42 MiB`\n\n- **Dataset size** : `45.68 MiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Yes\n\n- **Splits**:\n\n| Split | Examples |\n|-----------|----------|\n| `'train'` | 2,500 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'audio': Audio(shape=(None,), dtype=int64),\n 'audio/filename': Text(shape=(), dtype=string),\n 'label': ClassLabel(shape=(), dtype=int64, num_classes=10),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|----------------|--------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| audio | Audio | (None,) | int64 | |\n| audio/filename | Text | | string | |\n| label | ClassLabel | | int64 | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('audio', 'label')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @ONLINE {Free Spoken Digit Dataset,\n author = \"Zohar Jackson\",\n title = \"Spoken_Digit\",\n year = \"2016\",\n url = \"https://github.com/Jakobovski/free-spoken-digit-dataset\"\n }"]]