cnn_dailymail
コレクションでコンテンツを整理
必要に応じて、コンテンツの保存と分類を行います。
CNN/DailyMail の匿名化されていない要約データセット。
2つの特徴があります: - article: 要約されるドキュメントとして使用されるニュース記事のテキスト - ハイライト: ターゲットの要約である各ハイライトとその周囲のハイライトの結合テキスト
スプリット | 例 |
---|
'test' | 11,490 |
'train' | 287,113 |
'validation' | 13,368 |
FeaturesDict({
'article': Text(shape=(), dtype=string),
'highlights': Text(shape=(), dtype=string),
'id': Text(shape=(), dtype=string),
'publisher': Text(shape=(), dtype=string),
})
特徴 | クラス | 形 | Dtype | 説明 |
---|
| 特徴辞書 | | | |
論文 | 文章 | | ストリング | |
ハイライト | 文章 | | ストリング | |
ID | 文章 | | ストリング | |
パブリッシャー | 文章 | | ストリング | |
@article{DBLP:journals/corr/SeeLM17,
author = {Abigail See and
Peter J. Liu and
Christopher D. Manning},
title = {Get To The Point: Summarization with Pointer-Generator Networks},
journal = {CoRR},
volume = {abs/1704.04368},
year = {2017},
url = {http://arxiv.org/abs/1704.04368},
archivePrefix = {arXiv},
eprint = {1704.04368},
timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/SeeLM17},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{hermann2015teaching,
title={Teaching machines to read and comprehend},
author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},
booktitle={Advances in neural information processing systems},
pages={1693--1701},
year={2015}
}
特に記載のない限り、このページのコンテンツはクリエイティブ・コモンズの表示 4.0 ライセンスにより使用許諾されます。コードサンプルは Apache 2.0 ライセンスにより使用許諾されます。詳しくは、Google Developers サイトのポリシーをご覧ください。Java は Oracle および関連会社の登録商標です。
最終更新日 2023-01-04 UTC。
[null,null,["最終更新日 2023-01-04 UTC。"],[],[],null,["# cnn_dailymail\n\n\u003cbr /\u003e\n\n- **Description**:\n\nCNN/DailyMail non-anonymized summarization dataset.\n\nThere are two features: - article: text of news article, used as the document to\nbe summarized - highlights: joined text of highlights with and around\neach highlight, which is the target summary\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/cnn-daily-mail-1)\n\n- **Homepage** :\n \u003chttps://github.com/abisee/cnn-dailymail\u003e\n\n- **Source code** :\n [`tfds.summarization.CnnDailymail`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/cnn_dailymail.py)\n\n- **Versions**:\n\n - `1.0.0`: New split API (\u003chttps://tensorflow.org/datasets/splits\u003e)\n - `2.0.0`: Separate target sentences with newline. (Having the model\n predict newline separators makes it easier to evaluate using\n summary-level ROUGE.)\n\n - `3.0.0`: Using cased version.\n\n - `3.1.0`: Removed BuilderConfig\n\n - `3.2.0`: Remove extra space before added sentence period. This shouldn't\n affect ROUGE scores because punctuation is removed.\n\n - `3.3.0`: Add publisher feature.\n\n - **`3.4.0`** (default): Add ID feature.\n\n- **Download size** : `558.32 MiB`\n\n- **Dataset size** : `1.29 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 11,490 |\n| `'train'` | 287,113 |\n| `'validation'` | 13,368 |\n\n- **Feature structure**:\n\n FeaturesDict({\n 'article': Text(shape=(), dtype=string),\n 'highlights': Text(shape=(), dtype=string),\n 'id': Text(shape=(), dtype=string),\n 'publisher': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| article | Text | | string | |\n| highlights | Text | | string | |\n| id | Text | | string | |\n| publisher | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('article', 'highlights')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\n- **Citation**:\n\n @article{DBLP:journals/corr/SeeLM17,\n author = {Abigail See and\n Peter J. Liu and\n Christopher D. Manning},\n title = {Get To The Point: Summarization with Pointer-Generator Networks},\n journal = {CoRR},\n volume = {abs/1704.04368},\n year = {2017},\n url = {http://arxiv.org/abs/1704.04368},\n archivePrefix = {arXiv},\n eprint = {1704.04368},\n timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},\n biburl = {https://dblp.org/rec/bib/journals/corr/SeeLM17},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n }\n\n @inproceedings{hermann2015teaching,\n title={Teaching machines to read and comprehend},\n author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},\n booktitle={Advances in neural information processing systems},\n pages={1693--1701},\n year={2015}\n }"]]