TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

bookcorpusopen

参考：

plain_text

使用以下命令在 TFDS 中加载此数据集：

ds = tfds.load('huggingface:bookcorpusopen/plain_text')

说明：

Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.
This version of bookcorpus has 17868 dataset items (books). Each item contains two fields: title and text. The title is the name of the book (just the file name) while text contains unprocessed book text. The bookcorpus has been prepared by Shawn Presser and is generously hosted by The-Eye. The-Eye is a non-profit, community driven platform dedicated to the archiving and long-term preservation of any and all data including but by no means limited to... websites, books, games, software, video, audio, other digital-obscura and ideas.

许可：无已知许可
版本：1.0.0
拆分：

拆分	样本
`'train'`	17868

特征：

{
    "title": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

bookcorpusopen 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

plain_text

bookcorpusopen