covid19sum
Stay organized with collections
Save and categorize content based on your preferences.
Warning: Manual download required. See instructions below.
CORD-19 is a resource of over 45,000 scholarly articles, including over 33,000
with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.
To help organizing information in scientific literatures of COVID-19 through
abstractive summarization. This dataset parse those articles to pairs of
document and summaries of full_text-abstract or introduction-abstract.
Features includes strings of: abstract, full_text, sha (hash of pdf), source_x
(source of publication), title, doi (digital object identifier), license,
authors, publish_time, journal, url.
FeaturesDict ({
'abstract' : string ,
'authors' : string ,
'body_text' : Sequence ({
'section' : string ,
'text' : string ,
}),
'doi' : string ,
'journal' : string ,
'license' : string ,
'publish_time' : string ,
'sha' : string ,
'source_x' : string ,
'title' : string ,
'url' : string ,
})
Feature
Class
Shape
Dtype
Description
FeaturesDict
abstract
Tensor
string
authors
Tensor
string
body_text
Sequence
body_text/section
Tensor
string
body_text/text
Tensor
string
doi
Tensor
string
journal
Tensor
string
license
Tensor
string
publish_time
Tensor
string
sha
Tensor
string
source_x
Tensor
string
title
Tensor
string
url
Tensor
string
@ONLINE { CORD - 19 - research - challenge ,
author = "An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House" ,
title = "COVID-19 Open Research Dataset Challenge (CORD-19)" ,
month = "april" ,
year = "2020" ,
url = "https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge"
}
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-12-06 UTC.
[null,null,["Last updated 2022-12-06 UTC."],[],[],null,["# covid19sum\n\n\u003cbr /\u003e\n\n| **Warning:** Manual download required. See instructions below.\n\n- **Description**:\n\nCORD-19 is a resource of over 45,000 scholarly articles, including over 33,000\nwith full text, about COVID-19, SARS-CoV-2, and related coronaviruses.\n\nTo help organizing information in scientific literatures of COVID-19 through\nabstractive summarization. This dataset parse those articles to pairs of\ndocument and summaries of full_text-abstract or introduction-abstract.\n\nFeatures includes strings of: abstract, full_text, sha (hash of pdf), source_x\n(source of publication), title, doi (digital object identifier), license,\nauthors, publish_time, journal, url.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/cord-19)\n\n- **Homepage** :\n \u003chttps://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge\u003e\n\n- **Source code** :\n [`tfds.summarization.Covid19sum`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/covid19sum.py)\n\n- **Versions**:\n\n - **`1.0.0`** (default): No release notes.\n- **Download size** : `Unknown size`\n\n- **Dataset size** : `Unknown size`\n\n- **Manual download instructions** : This dataset requires you to\n download the source data manually into `download_config.manual_dir`\n (defaults to `~/tensorflow_datasets/downloads/manual/`): \n\n This dataset need to be manually downloaded through kaggle api:\n `kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge`\n Place the downloaded zip file in the manual folder.\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n Unknown\n\n- **Splits**:\n\n| Split | Examples |\n|-------|----------|\n\n- **Feature structure**:\n\n FeaturesDict({\n 'abstract': string,\n 'authors': string,\n 'body_text': Sequence({\n 'section': string,\n 'text': string,\n }),\n 'doi': string,\n 'journal': string,\n 'license': string,\n 'publish_time': string,\n 'sha': string,\n 'source_x': string,\n 'title': string,\n 'url': string,\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| abstract | Tensor | | string | |\n| authors | Tensor | | string | |\n| body_text | Sequence | | | |\n| body_text/section | Tensor | | string | |\n| body_text/text | Tensor | | string | |\n| doi | Tensor | | string | |\n| journal | Tensor | | string | |\n| license | Tensor | | string | |\n| publish_time | Tensor | | string | |\n| sha | Tensor | | string | |\n| source_x | Tensor | | string | |\n| title | Tensor | | string | |\n| url | Tensor | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('body_text', 'abstract')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Examples**\n ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n Missing.\n\n- **Citation**:\n\n @ONLINE {CORD-19-research-challenge,\n author = \"An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House\",\n title = \"COVID-19 Open Research Dataset Challenge (CORD-19)\",\n month = \"april\",\n year = \"2020\",\n url = \"https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge\"\n }"]]