TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

salient_span_wikipedia

Description:

Wikipedia sentences with labeled salient spans.

Homepage: https://www.tensorflow.org/datasets/catalog/salient_span_wikipedia
Source code: tfds.datasets.salient_span_wikipedia.Builder
Versions:
- 1.0.0 (default): No release notes.
Download size: Unknown size
Auto-cached (documentation): No
Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{guu2020realm,
    title={REALM: Retrieval-Augmented Language Model Pre-Training},
    author={Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang},
    year={2020},
    journal = {arXiv e-prints},
    archivePrefix = {arXiv},
    eprint={2002.08909},
}

salient_span_wikipedia/sentences (default config)

Config description: Examples are individual sentences containing entities.
Dataset size: 20.57 GiB
Splits:

Split	Examples
`'train'`	82,291,706

Feature structure:

FeaturesDict({
    'spans': Sequence({
        'limit': int32,
        'start': int32,
        'type': string,
    }),
    'text': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
spans	Sequence
spans/limit	Tensor	int32
spans/start	Tensor	int32
spans/type	Tensor	string
text	Text	string
title	Text	string

Examples (tfds.as_dataframe):

salient_span_wikipedia/documents

Config description: Examples re full documents.
Dataset size: 16.52 GiB
Splits:

Split	Examples
`'train'`	13,353,718

Feature structure:

FeaturesDict({
    'sentences': Sequence({
        'limit': int32,
        'start': int32,
    }),
    'spans': Sequence({
        'limit': int32,
        'start': int32,
        'type': string,
    }),
    'text': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
sentences	Sequence
sentences/limit	Tensor	int32
sentences/start	Tensor	int32
spans	Sequence
spans/limit	Tensor	int32
spans/start	Tensor	int32
spans/type	Tensor	string
text	Text	string
title	Text	string

Examples (tfds.as_dataframe):