- Description:
The shared task of CoNLL-2002 concerns language-independent named entity recognition. The types of named entities include: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The participants of the shared task were offered training and test data for at least two languages. Information sources other than the training data might have been used in this shared task.
Homepage: https://aclanthology.org/W02-2024/
Source code:
tfds.text.conll2002.Conll2002Versions:
1.0.0(default): Initial release.
Auto-cached (documentation): Yes
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Citation:
@inproceedings{tjong-kim-sang-2002-introduction,
title = "Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition",
author = "Tjong Kim Sang, Erik F.",
booktitle = "{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)",
year = "2002",
url = "https://aclanthology.org/W02-2024",
}
conll2002/es (default config)
Download size:
3.95 MiBDataset size:
3.52 MiBSplits:
| Split | Examples |
|---|---|
'dev' |
1,916 |
'test' |
1,518 |
'train' |
8,324 |
- Feature structure:
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=60)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| ner | Sequence(ClassLabel) | (None,) | int64 | |
| pos | Sequence(ClassLabel) | (None,) | int64 | |
| tokens | Sequence(Text) | (None,) | string |
- Examples (tfds.as_dataframe):
conll2002/nl
Download size:
3.47 MiBDataset size:
3.55 MiBSplits:
| Split | Examples |
|---|---|
'dev' |
2,896 |
'test' |
5,196 |
'train' |
15,807 |
- Feature structure:
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=12)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| FeaturesDict | ||||
| ner | Sequence(ClassLabel) | (None,) | int64 | |
| pos | Sequence(ClassLabel) | (None,) | int64 | |
| tokens | Sequence(Text) | (None,) | string |
- Examples (tfds.as_dataframe):