TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

gap

Description:

GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia and released by Google AI Language for the evaluation of coreference resolution in practical applications.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/google-research-datasets/gap-coreference
Source code: tfds.text.Gap
Versions:
- 0.1.0: Initial release.
- 0.1.1 (default): Fixes parsing of boolean field A-coref and B-coref.
Download size: 2.29 MiB
Dataset size: 2.96 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	2,000
`'train'`	2,000
`'validation'`	454

Feature structure:

FeaturesDict({
    'A': Text(shape=(), dtype=string),
    'A-coref': bool,
    'A-offset': int32,
    'B': Text(shape=(), dtype=string),
    'B-coref': bool,
    'B-offset': int32,
    'ID': Text(shape=(), dtype=string),
    'Pronoun': Text(shape=(), dtype=string),
    'Pronoun-offset': int32,
    'Text': Text(shape=(), dtype=string),
    'URL': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
A	Text	string
A-coref	Tensor	bool
A-offset	Tensor	int32
B	Text	string
B-coref	Tensor	bool
B-offset	Tensor	int32
ID	Text	string
Pronoun	Text	string
Pronoun-offset	Tensor	int32
Text	Text	string
URL	Text	string

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{DBLP:journals/corr/abs-1810-05201,
  author    = {Kellie Webster and
               Marta Recasens and
               Vera Axelrod and
               Jason Baldridge},
  title     = {Mind the {GAP:} {A} Balanced Corpus of Gendered Ambiguous Pronouns},
  journal   = {CoRR},
  volume    = {abs/1810.05201},
  year      = {2018},
  url       = {http://arxiv.org/abs/1810.05201},
  archivePrefix = {arXiv},
  eprint    = {1810.05201},
  timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1810-05201},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}