멘요20k_mt

참고자료:

menyo20k_mt

TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.

ds = tfds.load('huggingface:menyo20k_mt/menyo20k_mt')
  • 설명 :
MENYO-20k is a multi-domain parallel dataset with texts obtained from news articles, ted talks, movie transcripts, radio transcripts, science and technology texts, and other short articles curated from the web and professional translators. The dataset has 20,100 parallel sentences split into 10,070 training sentences, 3,397 development sentences, and 6,633 test sentences (3,419 multi-domain, 1,714 news domain, and 1,500 ted talks speech transcript domain). The development and test sets are available upon request.
  • 라이센스 : Ted talk, JW news 등 일부 데이터 소스에는 상업적 사용 허가가 필요하므로 비상업적 용도로 사용됩니다.
  • 버전 : 1.0.0
  • 분할 :
나뉘다
'train' 10070
  • 특징 :
{
    "translation": {
        "languages": [
            "en",
            "yo"
        ],
        "id": null,
        "_type": "Translation"
    }
}