booksum

  • Description:

BookSum: A Collection of Datasets for Long-form Narrative Summarization

This implementation currently only supports book and chapter summaries.

GitHub: https://github.com/salesforce/booksum

The manual folder should contain the following directories:

- `booksum/`
- `all_chapterized_books/`
  • Auto-cached (documentation): Yes (test, validation), Only when shuffle_files=False (train)

  • Feature structure:

FeaturesDict({
    'document': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
document Text string
summary Text string
@article{kryscinski2021booksum,
      title={BookSum: A Collection of Datasets for Long-form Narrative Summarization},
      author={Wojciech Kry{\'s}ci{\'n}ski and Nazneen Rajani and Divyansh Agarwal and Caiming Xiong and Dragomir Radev},
      year={2021},
      eprint={2105.08209},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

booksum/book (default config)

  • Config description: Book-level summarization

  • Dataset size: 208.81 MiB

  • Splits:

Split Examples
'test' 46
'train' 312
'validation' 45

booksum/chapter

  • Config description: chapter-level summarization

  • Dataset size: 216.71 MiB

  • Splits:

Split Examples
'test' 1,083
'train' 6,524
'validation' 891