참고자료:
단순화
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:turk/simplification')
- 설명 :
TURKCorpus is a dataset for evaluating sentence simplification systems that focus on lexical paraphrasing,
as described in "Optimizing Statistical Machine Translation for Text Simplification". The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 8 times by different annotators.
- 라이센스 : GNU 일반 공중 라이센스 v3.0
- 버전 : 1.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 359 |
'validation' | 2000 |
- 특징 :
{
"original": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"simplifications": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}