- Description:
Mozilla Common Voice Dataset
Additional Documentation: Explore on Papers With Code
Homepage: https://voice.mozilla.org/en/datasets
Source code:
tfds.audio.CommonVoiceVersions:
1.0.0: Initial release.2.0.0(default): Updated to corpus 6.1 from 2020-12-11.
Feature structure:
FeaturesDict({
'accent': Text(shape=(), dtype=string),
'age': Text(shape=(), dtype=string),
'client_id': Text(shape=(), dtype=string),
'downvotes': Scalar(shape=(), dtype=int32, description=Number of people who said audio does not match text),
'gender': ClassLabel(shape=(), dtype=int64, num_classes=3),
'segment': Text(shape=(), dtype=string),
'sentence': Text(shape=(), dtype=string),
'upvotes': Scalar(shape=(), dtype=int32, description=Number of people who said audio matches the text),
'voice': Audio(shape=(None,), dtype=int64),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| accent | Text | string | Accent of the speaker, see https://github.com/common-voice/common-voice/blob/main/web/src/stores/demographics.ts | |
| age | Text | string | Age bucket of the speaker (e.g. teens, or fourties), see https://github.com/common-voice/common-voice/blob/main/web/src/stores/demographics.ts | |
| client_id | Text | string | Hashed UUID of a given user | |
| downvotes | Scalar | int32 | Number of people who said audio does not match text | |
| gender | ClassLabel | int64 | Gender of the speaker | |
| segment | Text | string | If sentence belongs to a custom dataset segment, it will be listed here | |
| sentence | Text | string | Supposed transcription of the audio | |
| upvotes | Scalar | int32 | Number of people who said audio matches the text | |
| voice | Audio | (None,) | int64 |
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Citation:
common_voice/en (default config)
Config description: Language Code: en
Download size:
56.45 GiBDataset size:
2.79 TiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
16,164 |
'test' |
16,164 |
'train' |
564,337 |
'validation' |
1,224,864 |
- Examples (tfds.as_dataframe):
common_voice/ab
Config description: Language Code: ab
Download size:
39.14 MiBDataset size:
133.24 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'test' |
9 |
'train' |
22 |
'validation' |
31 |
- Examples (tfds.as_dataframe):
common_voice/ar
Config description: Language Code: ar
Download size:
1.64 GiBDataset size:
67.16 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
7,517 |
'test' |
7,622 |
'train' |
14,227 |
'validation' |
43,291 |
- Examples (tfds.as_dataframe):
common_voice/as
Config description: Language Code: as
Download size:
21.20 MiBDataset size:
1.65 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
124 |
'test' |
110 |
'train' |
270 |
'validation' |
504 |
- Examples (tfds.as_dataframe):
common_voice/br
Config description: Language Code: br
Download size:
443.72 MiBDataset size:
13.46 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,997 |
'test' |
2,087 |
'train' |
2,780 |
'validation' |
8,560 |
- Examples (tfds.as_dataframe):
common_voice/ca
Config description: Language Code: ca
Download size:
19.32 GiBDataset size:
1.19 TiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
15,724 |
'test' |
15,724 |
'train' |
285,584 |
'validation' |
416,701 |
- Examples (tfds.as_dataframe):
common_voice/cnh
Config description: Language Code: cnh
Download size:
153.86 MiBDataset size:
5.12 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
756 |
'test' |
752 |
'train' |
807 |
'validation' |
2,432 |
- Examples (tfds.as_dataframe):
common_voice/cs
Config description: Language Code: cs
Download size:
1.18 GiBDataset size:
56.89 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
4,118 |
'test' |
4,144 |
'train' |
5,655 |
'validation' |
30,431 |
- Examples (tfds.as_dataframe):
common_voice/cv
Config description: Language Code: cv
Download size:
418.98 MiBDataset size:
8.10 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
818 |
'test' |
788 |
'train' |
931 |
'validation' |
3,496 |
- Examples (tfds.as_dataframe):
common_voice/cy
Config description: Language Code: cy
Download size:
3.20 GiBDataset size:
128.68 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
4,776 |
'test' |
4,820 |
'train' |
6,839 |
'validation' |
72,984 |
- Examples (tfds.as_dataframe):
common_voice/de
Config description: Language Code: de
Download size:
21.68 GiBDataset size:
1.29 TiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
15,588 |
'test' |
15,588 |
'train' |
246,525 |
'validation' |
565,186 |
- Examples (tfds.as_dataframe):
common_voice/dv
Config description: Language Code: dv
Download size:
515.45 MiBDataset size:
31.59 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,077 |
'test' |
2,202 |
'train' |
2,680 |
'validation' |
11,866 |
- Examples (tfds.as_dataframe):
common_voice/el
Config description: Language Code: el
Download size:
363.89 MiBDataset size:
14.62 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,401 |
'test' |
1,522 |
'train' |
2,316 |
'validation' |
5,996 |
- Examples (tfds.as_dataframe):
common_voice/eo
Config description: Language Code: eo
Download size:
2.69 GiBDataset size:
167.14 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
8,987 |
'test' |
8,969 |
'train' |
19,587 |
'validation' |
58,094 |
- Examples (tfds.as_dataframe):
common_voice/es
Config description: Language Code: es
Download size:
15.08 GiBDataset size:
684.66 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
15,089 |
'test' |
15,089 |
'train' |
161,813 |
'validation' |
236,314 |
- Examples (tfds.as_dataframe):
common_voice/et
Config description: Language Code: et
Download size:
731.63 MiBDataset size:
37.95 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,507 |
'test' |
2,509 |
'train' |
2,966 |
'validation' |
10,683 |
- Examples (tfds.as_dataframe):
common_voice/eu
Config description: Language Code: eu
Download size:
3.41 GiBDataset size:
127.60 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
5,172 |
'test' |
5,172 |
'train' |
7,505 |
'validation' |
63,009 |
- Examples (tfds.as_dataframe):
common_voice/fa
Config description: Language Code: fa
Download size:
8.27 GiBDataset size:
328.61 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
5,213 |
'test' |
5,213 |
'train' |
7,593 |
'validation' |
251,659 |
- Examples (tfds.as_dataframe):
common_voice/fi
Config description: Language Code: fi
Download size:
47.57 MiBDataset size:
3.41 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
415 |
'test' |
428 |
'train' |
460 |
'validation' |
1,305 |
- Examples (tfds.as_dataframe):
common_voice/fr
Config description: Language Code: fr
Download size:
17.82 GiBDataset size:
1.17 TiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
15,763 |
'test' |
15,763 |
'train' |
298,982 |
'validation' |
461,004 |
- Examples (tfds.as_dataframe):
common_voice/fy-NL
Config description: Language Code: fy-NL
Download size:
1.15 GiBDataset size:
29.93 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,790 |
'test' |
3,020 |
'train' |
3,927 |
'validation' |
10,495 |
- Examples (tfds.as_dataframe):
common_voice/ga-IE
Config description: Language Code: ga-IE
Download size:
149.30 MiBDataset size:
5.11 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
497 |
'test' |
506 |
'train' |
541 |
'validation' |
3,352 |
- Examples (tfds.as_dataframe):
common_voice/hi
Config description: Language Code: hi
Download size:
20.43 MiBDataset size:
1.15 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
135 |
'test' |
127 |
'train' |
157 |
'validation' |
419 |
- Examples (tfds.as_dataframe):
common_voice/hsb
Config description: Language Code: hsb
Download size:
75.69 MiBDataset size:
5.67 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
172 |
'test' |
387 |
'train' |
808 |
'validation' |
1,367 |
- Examples (tfds.as_dataframe):
common_voice/hu
Config description: Language Code: hu
Download size:
231.51 MiBDataset size:
17.07 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,434 |
'test' |
1,649 |
'train' |
3,348 |
'validation' |
6,457 |
- Examples (tfds.as_dataframe):
common_voice/ia
Config description: Language Code: ia
Download size:
216.01 MiBDataset size:
14.99 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,601 |
'test' |
899 |
'train' |
3,477 |
'validation' |
5,978 |
- Examples (tfds.as_dataframe):
common_voice/id
Config description: Language Code: id
Download size:
453.87 MiBDataset size:
17.20 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,835 |
'test' |
1,844 |
'train' |
2,130 |
'validation' |
8,696 |
- Examples (tfds.as_dataframe):
common_voice/it
Config description: Language Code: it
Download size:
5.20 GiBDataset size:
316.38 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
12,928 |
'test' |
12,928 |
'train' |
58,015 |
'validation' |
102,579 |
- Examples (tfds.as_dataframe):
common_voice/ja
Config description: Language Code: ja
Download size:
145.80 MiBDataset size:
6.83 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
586 |
'test' |
632 |
'train' |
722 |
'validation' |
3,072 |
- Examples (tfds.as_dataframe):
common_voice/ka
Config description: Language Code: ka
Download size:
99.45 MiBDataset size:
7.51 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
527 |
'test' |
656 |
'train' |
1,058 |
'validation' |
2,275 |
- Examples (tfds.as_dataframe):
common_voice/kab
Config description: Language Code: kab
Download size:
15.99 GiBDataset size:
718.51 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
14,622 |
'test' |
14,622 |
'train' |
120,530 |
'validation' |
573,718 |
- Examples (tfds.as_dataframe):
common_voice/ky
Config description: Language Code: ky
Download size:
552.60 MiBDataset size:
18.70 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,511 |
'test' |
1,503 |
'train' |
1,955 |
'validation' |
9,236 |
- Examples (tfds.as_dataframe):
common_voice/lg
Config description: Language Code: lg
Download size:
198.55 MiBDataset size:
6.65 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
384 |
'test' |
584 |
'train' |
1,250 |
'validation' |
2,220 |
- Examples (tfds.as_dataframe):
common_voice/lt
Config description: Language Code: lt
Download size:
129.03 MiBDataset size:
4.79 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
244 |
'test' |
466 |
'train' |
931 |
'validation' |
1,644 |
- Examples (tfds.as_dataframe):
common_voice/lv
Config description: Language Code: lv
Download size:
198.66 MiBDataset size:
13.07 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,002 |
'test' |
1,882 |
'train' |
2,552 |
'validation' |
6,444 |
- Examples (tfds.as_dataframe):
common_voice/mn
Config description: Language Code: mn
Download size:
463.84 MiBDataset size:
22.09 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,837 |
'test' |
1,862 |
'train' |
2,183 |
'validation' |
7,487 |
- Examples (tfds.as_dataframe):
common_voice/mt
Config description: Language Code: mt
Download size:
405.42 MiBDataset size:
15.09 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,516 |
'test' |
1,617 |
'train' |
2,036 |
'validation' |
5,747 |
- Examples (tfds.as_dataframe):
common_voice/nl
Config description: Language Code: nl
Download size:
1.62 GiBDataset size:
90.20 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
4,938 |
'test' |
5,708 |
'train' |
9,460 |
'validation' |
52,488 |
- Examples (tfds.as_dataframe):
common_voice/or
Config description: Language Code: or
Download size:
189.85 MiBDataset size:
1.97 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
129 |
'test' |
98 |
'train' |
388 |
'validation' |
615 |
- Examples (tfds.as_dataframe):
common_voice/pa-IN
Config description: Language Code: pa-IN
Download size:
66.52 MiBDataset size:
1.03 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
44 |
'test' |
116 |
'train' |
211 |
'validation' |
371 |
- Examples (tfds.as_dataframe):
common_voice/pl
Config description: Language Code: pl
Download size:
3.29 GiBDataset size:
141.06 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
5,153 |
'test' |
5,153 |
'train' |
7,468 |
'validation' |
90,791 |
- Examples (tfds.as_dataframe):
common_voice/pt
Config description: Language Code: pt
Download size:
1.59 GiBDataset size:
75.64 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
4,592 |
'test' |
4,641 |
'train' |
6,514 |
'validation' |
41,584 |
- Examples (tfds.as_dataframe):
common_voice/rm-sursilv
Config description: Language Code: rm-sursilv
Download size:
263.17 MiBDataset size:
12.31 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,205 |
'test' |
1,194 |
'train' |
1,384 |
'validation' |
3,783 |
- Examples (tfds.as_dataframe):
common_voice/rm-vallader
Config description: Language Code: rm-vallader
Download size:
103.11 MiBDataset size:
4.89 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
357 |
'test' |
378 |
'train' |
574 |
'validation' |
1,316 |
- Examples (tfds.as_dataframe):
common_voice/ro
Config description: Language Code: ro
Download size:
249.84 MiBDataset size:
14.54 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
858 |
'test' |
1,778 |
'train' |
3,399 |
'validation' |
6,039 |
- Examples (tfds.as_dataframe):
common_voice/ru
Config description: Language Code: ru
Download size:
3.40 GiBDataset size:
175.04 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
7,963 |
'test' |
8,007 |
'train' |
15,481 |
'validation' |
74,256 |
- Examples (tfds.as_dataframe):
common_voice/rw
Config description: Language Code: rw
Download size:
39.62 GiBDataset size:
2.18 TiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
15,032 |
'test' |
15,724 |
'train' |
515,197 |
'validation' |
832,929 |
- Examples (tfds.as_dataframe):
common_voice/sah
Config description: Language Code: sah
Download size:
172.85 MiBDataset size:
9.42 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
405 |
'test' |
757 |
'train' |
1,442 |
'validation' |
2,606 |
- Examples (tfds.as_dataframe):
common_voice/sl
Config description: Language Code: sl
Download size:
212.43 MiBDataset size:
9.67 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
556 |
'test' |
881 |
'train' |
2,038 |
'validation' |
4,669 |
- Examples (tfds.as_dataframe):
common_voice/sv-SE
Config description: Language Code: sv-SE
Download size:
401.91 MiBDataset size:
18.27 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,019 |
'test' |
2,027 |
'train' |
2,331 |
'validation' |
12,552 |
- Examples (tfds.as_dataframe):
common_voice/ta
Config description: Language Code: ta
Download size:
648.28 MiBDataset size:
24.06 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,779 |
'test' |
1,781 |
'train' |
2,009 |
'validation' |
12,652 |
- Examples (tfds.as_dataframe):
common_voice/th
Config description: Language Code: th
Download size:
325.49 MiBDataset size:
18.32 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,922 |
'test' |
2,188 |
'train' |
2,917 |
'validation' |
7,028 |
- Examples (tfds.as_dataframe):
common_voice/tr
Config description: Language Code: tr
Download size:
592.09 MiBDataset size:
28.21 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
1,647 |
'test' |
1,647 |
'train' |
1,831 |
'validation' |
18,685 |
- Examples (tfds.as_dataframe):
common_voice/tt
Config description: Language Code: tt
Download size:
741.15 MiBDataset size:
46.85 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,127 |
'test' |
4,485 |
'train' |
11,211 |
'validation' |
25,781 |
- Examples (tfds.as_dataframe):
common_voice/uk
Config description: Language Code: uk
Download size:
1.13 GiBDataset size:
49.66 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
3,236 |
'test' |
3,235 |
'train' |
4,035 |
'validation' |
22,337 |
- Examples (tfds.as_dataframe):
common_voice/vi
Config description: Language Code: vi
Download size:
49.52 MiBDataset size:
1.47 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
200 |
'test' |
198 |
'train' |
221 |
'validation' |
619 |
- Examples (tfds.as_dataframe):
common_voice/vot
Config description: Language Code: vot
Download size:
7.43 MiBDataset size:
11.39 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples |
|---|---|
'train' |
3 |
'validation' |
3 |
- Examples (tfds.as_dataframe):
common_voice/zh-CN
Config description: Language Code: zh-CN
Download size:
2.03 GiBDataset size:
122.54 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
8,743 |
'test' |
8,760 |
'train' |
18,541 |
'validation' |
36,405 |
- Examples (tfds.as_dataframe):
common_voice/zh-HK
Config description: Language Code: zh-HK
Download size:
2.58 GiBDataset size:
78.80 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
5,172 |
'test' |
5,172 |
'train' |
7,506 |
'validation' |
41,835 |
- Examples (tfds.as_dataframe):
common_voice/zh-TW
Config description: Language Code: zh-TW
Download size:
2.03 GiBDataset size:
69.06 GiBAuto-cached (documentation): No
Splits:
| Split | Examples |
|---|---|
'dev' |
2,895 |
'test' |
2,895 |
'train' |
3,507 |
'validation' |
61,232 |
- Examples (tfds.as_dataframe):