librispeech
Stay organized with collections
Save and categorize content based on your preferences.
LibriSpeech is a corpus of approximately 1000 hours of read English speech with
sampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of
Daniel Povey. The data is derived from read audiobooks from the LibriVox
project, and has been carefully segmented and aligned.
It's recommended to use lazy audio decoding for faster reading and smaller
dataset size: - install tensorflow_io
library: pip install tensorflow-io
-
enable lazy decoding: tfds.load('librispeech', builder_kwargs={'config':
'lazy_decode'})
Split |
Examples |
'dev_clean' |
2,703 |
'dev_other' |
2,864 |
'test_clean' |
2,620 |
'test_other' |
2,939 |
'train_clean100' |
28,539 |
'train_clean360' |
104,014 |
'train_other500' |
148,688 |
FeaturesDict({
'chapter_id': int64,
'id': string,
'speaker_id': int64,
'speech': Audio(shape=(None,), dtype=int16),
'text': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
chapter_id |
Tensor |
|
int64 |
|
id |
Tensor |
|
string |
|
speaker_id |
Tensor |
|
int64 |
|
speech |
Audio |
(None,) |
int16 |
|
text |
Text |
|
string |
|
@inproceedings{panayotov2015librispeech,
title={Librispeech: an ASR corpus based on public domain audio books},
author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},
booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},
pages={5206--5210},
year={2015},
organization={IEEE}
}
librispeech/default (default config)
librispeech/lazy_decode
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-11 UTC.
[null,null,["Last updated 2024-12-11 UTC."],[],[],null,["# librispeech\n\n\u003cbr /\u003e\n\n- **Description**:\n\nLibriSpeech is a corpus of approximately 1000 hours of read English speech with\nsampling rate of 16 kHz, prepared by Vassil Panayotov with the assistance of\nDaniel Povey. The data is derived from read audiobooks from the LibriVox\nproject, and has been carefully segmented and aligned.\n\nIt's recommended to use lazy audio decoding for faster reading and smaller\ndataset size: - install `tensorflow_io` library: `pip install tensorflow-io` -\nenable lazy decoding: `tfds.load('librispeech', builder_kwargs={'config':\n'lazy_decode'})`\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/librispeech)\n\n- **Homepage** : \u003chttp://www.openslr.org/12\u003e\n\n- **Source code** :\n [`tfds.datasets.librispeech.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/librispeech/librispeech_dataset_builder.py)\n\n- **Download size** : `57.14 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|--------------------|----------|\n| `'dev_clean'` | 2,703 |\n| `'dev_other'` | 2,864 |\n| `'test_clean'` | 2,620 |\n| `'test_other'` | 2,939 |\n| `'train_clean100'` | 28,539 |\n| `'train_clean360'` | 104,014 |\n| `'train_other500'` | 148,688 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'chapter_id': int64,\n 'id': string,\n 'speaker_id': int64,\n 'speech': Audio(shape=(None,), dtype=int16),\n 'text': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|---------|--------|-------------|\n| | FeaturesDict | | | |\n| chapter_id | Tensor | | int64 | |\n| id | Tensor | | string | |\n| speaker_id | Tensor | | int64 | |\n| speech | Audio | (None,) | int16 | |\n| text | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('speech', 'text')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @inproceedings{panayotov2015librispeech,\n title={Librispeech: an ASR corpus based on public domain audio books},\n author={Panayotov, Vassil and Chen, Guoguo and Povey, Daniel and Khudanpur, Sanjeev},\n booktitle={Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on},\n pages={5206--5210},\n year={2015},\n organization={IEEE}\n }\n\nlibrispeech/default (default config)\n------------------------------------\n\n- **Config description**: Default dataset.\n\n- **Versions**:\n\n - **`2.1.1`** (default): Fix speech data type with dtype=tf.int16.\n - `2.1.2`: Add 'lazy_decode' config.\n- **Dataset size** : `304.47 GiB`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nlibrispeech/lazy_decode\n-----------------------\n\n- **Config description**: Raw audio dataset.\n\n- **Versions**:\n\n - `2.1.1`: Fix speech data type with dtype=tf.int16.\n - **`2.1.2`** (default): Add 'lazy_decode' config.\n- **Dataset size** : `59.37 GiB`\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]