参考:
generated_reviews_enth
使用以下命令在 TFDS 中加载此数据集:
ds = tfds.load('huggingface:generated_reviews_enth/generated_reviews_enth')
- 说明:
`generated_reviews_enth`
Generated product reviews dataset for machine translation quality prediction, part of [scb-mt-en-th-2020](https://arxiv.org/pdf/2007.03541.pdf)
`generated_reviews_enth` is created as part of [scb-mt-en-th-2020](https://arxiv.org/pdf/2007.03541.pdf) for machine translation task.
This dataset (referred to as `generated_reviews_yn` in [scb-mt-en-th-2020](https://arxiv.org/pdf/2007.03541.pdf)) are English product reviews
generated by [CTRL](https://arxiv.org/abs/1909.05858), translated by Google Translate API and annotated as accepted or rejected (`correct`)
based on fluency and adequacy of the translation by human annotators.
This allows it to be used for English-to-Thai translation quality esitmation (binary label), machine translation, and sentiment analysis.
- 许可:无已知许可
- 版本:1.0.0
- 拆分:
拆分 | 样本 |
---|---|
'test' |
17453 |
'train' |
141369 |
'validation' |
15708 |
- 特征:
{
"translation": {
"languages": [
"en",
"th"
],
"id": null,
"_type": "Translation"
},
"review_star": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"correct": {
"num_classes": 2,
"names": [
"neg",
"pos"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
}
}