참고자료:
가득한
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:common_language/full')
- 설명 :
This dataset is composed of speech recordings from languages that were carefully selected from the CommonVoice database.
The total duration of audio recordings is 45.1 hours (i.e., 1 hour of material for each language).
The dataset has been extracted from CommonVoice to train language-id systems.
- 라이선스 : https://creativecommons.org/licenses/by/4.0/legalcode
- 버전 : 0.1.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 5963 |
'train' | 22194 |
'validation' | 5888 |
- 특징 :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"path": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"age": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"gender": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"language": {
"num_classes": 45,
"names": [
"Arabic",
"Basque",
"Breton",
"Catalan",
"Chinese_China",
"Chinese_Hongkong",
"Chinese_Taiwan",
"Chuvash",
"Czech",
"Dhivehi",
"Dutch",
"English",
"Esperanto",
"Estonian",
"French",
"Frisian",
"Georgian",
"German",
"Greek",
"Hakha_Chin",
"Indonesian",
"Interlingua",
"Italian",
"Japanese",
"Kabyle",
"Kinyarwanda",
"Kyrgyz",
"Latvian",
"Maltese",
"Mangolian",
"Persian",
"Polish",
"Portuguese",
"Romanian",
"Romansh_Sursilvan",
"Russian",
"Sakha",
"Slovenian",
"Spanish",
"Swedish",
"Tamil",
"Tatar",
"Turkish",
"Ukranian",
"Welsh"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
}