参考:
使用以下命令在 TFDS 中加载此数据集:
ds = tfds.load('huggingface:eli5_category')
- 说明:
The ELI5-Category dataset is a smaller but newer and categorized version of the original ELI5 dataset. After 2017, a tagging system was introduced to this subreddit so that the questions can be categorized into different topics according to their tags. Since the training and validation set is built by questions in different topics, the dataset is expected to alleviate the train/validation overlapping issue in the original ELI5 dataset.
- 许可:无已知许可
- 版本:1.0.0
- 拆分:
拆分 | 样本 |
---|---|
'test' |
5411 |
'train' |
91772 |
'validation1' |
5446 |
'validation2' |
2375 |
- 特征:
{
"q_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"selftext": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"category": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"subreddit": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"answers": {
"a_id": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"text": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"score": {
"feature": {
"dtype": "int32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"text_urls": {
"feature": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"title_urls": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"selftext_urls": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}