Referencias:
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:newsroom')
- Descripción :
NEWSROOM is a large dataset for training and evaluating summarization systems.
It contains 1.3 million articles and summaries written by authors and
editors in the newsrooms of 38 major publications.
Dataset features includes:
- text: Input news text.
- summary: Summary for the news.
And additional features:
- title: news title.
- url: url of the news.
- date: date of the article.
- density: extractive density.
- coverage: extractive coverage.
- compression: compression ratio.
- density_bin: low, medium, high.
- coverage_bin: extractive, abstractive.
- compression_bin: low, medium, high.
This dataset can be downloaded upon requests. Unzip all the contents
"train.jsonl, dev.josnl, test.jsonl" to the tfds folder.
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 108862 |
'train' | 995041 |
'validation' | 108837 |
- Características :
{
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"summary": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"date": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"density_bin": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"coverage_bin": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"compression_bin": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"density": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"coverage": {
"dtype": "float32",
"id": null,
"_type": "Value"
},
"compression": {
"dtype": "float32",
"id": null,
"_type": "Value"
}
}