Join TensorFlow at Google I/O, May 11-12 Register now

covid19sum

  • Description:

CORD-19 is a resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

To help organizing information in scientific literatures of COVID-19 through abstractive summarization. This dataset parse those articles to pairs of document and summaries of full_text-abstract or introduction-abstract.

Features includes strings of: abstract, full_text, sha (hash of pdf), source_x (source of publication), title, doi (digital object identifier), license, authors, publish_time, journal, url.

  • Homepage: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

  • Source code: tfds.summarization.Covid19sum

  • Versions:

    • 1.0.0 (default): No release notes.
  • Download size: Unknown size

  • Dataset size: Unknown size

  • Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
    This dataset need to be manually downloaded through kaggle api: kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge Place the downloaded zip file in the manual folder.

  • Auto-cached (documentation): Unknown

  • Splits:

Split Examples
  • Feature structure:
FeaturesDict({
    'abstract': tf.string,
    'authors': tf.string,
    'body_text': Sequence({
        'section': tf.string,
        'text': tf.string,
    }),
    'doi': tf.string,
    'journal': tf.string,
    'license': tf.string,
    'publish_time': tf.string,
    'sha': tf.string,
    'source_x': tf.string,
    'title': tf.string,
    'url': tf.string,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
abstract Tensor tf.string
authors Tensor tf.string
body_text Sequence
body_text/section Tensor tf.string
body_text/text Tensor tf.string
doi Tensor tf.string
journal Tensor tf.string
license Tensor tf.string
publish_time Tensor tf.string
sha Tensor tf.string
source_x Tensor tf.string
title Tensor tf.string
url Tensor tf.string
@ONLINE {CORD-19-research-challenge,
    author = "An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House",
    title  = "COVID-19 Open Research Dataset Challenge (CORD-19)",
    month  = "april",
    year   = "2020",
    url    = "https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge"
}